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Abstract — We consider the fundamental limits of the secret key 
generation problem when the sources are randomly excited by 
the sender and there is a noiseless public discussion channel. In 
many practical communication settings, the sources or channels 
may be influenced by some parties involved. Similar to recent 
works on probing capacity and channels with action-dependent 
states, our system model captures such a scenario. We derive 
single-letter expressions for the secret key capacity. Our coding 
strategy involves a key generation scheme and wiretap channel 
coding. We show that the secret key capacity is composed of 
both source- and channel-type randomness. By assuming that 
the eavesdropper receives a degraded version of the legitimate 
receiver's observation, we also obtain a capacity result that 
does not involve any auxiliary random variables, and thus it is 
amenable to numerical evaluation. By evaluating the capacity for 
several degraded channels, we show that there is a fundamental 
interplay between the portion of the secret key rate that is due to 
that from source-type and that from channel-type randomness. 
In addition, we derive lower bounds on the achievable reliability 
and secrecy exponents, i.e., the exponential rates at which the 
probability of decoding error and the information leakage decay. 
These exponents allow us to determine the set of "strongly- 
achievable" secret key rates. Our exponents explicitly capture 
the twin effects of the channel and the source in the model. The 
exponents can be specialized to previously known results. We 
also demonstrate that there is an inherent tradeoff between the 
achievable reliability and secrecy exponents. 

Index Terms — Secret key capacity. Common randomness. 
Wiretap channel. Sender-excitation, Reliability exponent. Secrecy 
exponent. Degraded broadcast channel. Probing capacity 



I. Introduction 

Within the realm of information-theoretic secrecy ||2l, the 
foundations of sharing a secret key between two parties in 
the presence of an eavesdropper were initiated in lO, f?). 
Ahlswede and Csiszar [S] studied two models: the source- 
type model with wiretapper (Model SW) and the channel-type 
model with wiretapper (Model CW). In Model SW, the users 
obtain their observations from a discrete memoryless multiple 
source (DMMS) and communicate to each other via a noiseless 
authenticated pubUc channel. The public messages they send 
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can be regarded as compressed versions of the data in a 
multi-terminal source coding problem. The information that 
is independent of the public message can be used to generate 
secret keys. In Model CW, one legitimate user (the sender) 
controls the input of a discrete memoryless broadcast channel 
(DMBC), sending information to the legitimate receiver and 
the eavesdropper The sender randomly chooses a message 
and transmits it to the receiver Users may also discuss over a 
public channel and generate a key based on all the available 
data sent to them by the other party. It is shown that when one- 
way public discussion is allowed, the users can adopt wiretap 
channel coding Q, ||5l, IS) without the public channel and 
there is no loss of the secret key rate. 

However, in many applications, the security system can 
neither be adequately modeled by a source- nor a channel- 
type model. This work explores such a setting. In a nutshell, 
we derive capacity and error exponent results for the secret 
key agreement problem when the sender has the abiUty to 
use her private source of randomness to excite (or influence) 
the "state" of the DMMS. This is similar in spirit to the 
recent works on probing capacity and channels with action- 
dependent states ifTl- lfTol . We show via examples that there is 
an interplay between the wiretap secrecy rate and the amount 
of common randomness that can be used to generate a secret 
key. Our error exponent results generalize those in Gallager's 
seminal work in [ITTl Sec. 5.6] and |i2J. They may also be 
specialized to Hayashi's recent work on the characterization 
of the rate of decay of the information leakage rate lfT3l of 
wiretap channels. We show there is another frafifeoj^ between 
the reliability and secrecy exponents. 

A. Related Work 

There are other works dealing with non-source and non- 
channel models such as lfT4l . ifTSl . where users observe a 
DMMS and they can also transmit information via a wire- 
tap channel. However, no public discussion is allowed. The 
key generation scheme is based on the observation that the 
public message, which assists in generating the key, can be 
transmitted via the DMBC confidentially, resulting a higher 
secret key rate. In lfT6l - lfT8l . public discussion is allowed and 
there may also be a helper but unlike our work, the sender 
does not also receive a sequence as part of the channel output. 
The sender's ability to use her channel output and her source 
of private randomness to generate a secret key is one of the 
crucial components in our model. 

The authors in lfT9l - ll23l considered the setting where a 
wiretap channel is influenced by a random state that is known 
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at the sender (and possibly the receiver) and thus can be treated 
as a correlated source. In |fT9l . ||20| . the sender transmits a 
confidential message and the random state known noncausally 
is exploited in the coding scheme to confuse the eavesdropper. 
The lower bound is proved using a combination of Gel'fand- 
Pinsker coding and wiretap channel coding. A similar problem 
but with causal state information was studied in ||2T|| and 
the coding scheme involves block Markov coding. Shannon 
strategy and wiretap coding. In 1221 . 12311 . the goal was to 
generate a secret key when the encoder (and/or decoders) have 
noncausal state information. The authors presented a single- 
letter expression of the secret key capacity. The resulting 
key rate consists of two parts; the first is attributed to the 
rate of the confidential message using wiretap channel coding 
while treating the state sequence as a time-sharing sequence 
(multiplexing), while the second key, independent of the first 
one, is produced by exploiting the common knowledge of the 
state at the sender and the legitimate receiver 

Another motivation of this paper comes from the fact that in 
many applications such as storage for computer memories, the 
system (channel) may be influenced by a probing signal that 
is influenced by some of the users (typically the sender). This 
problem was first studied in the channel coding context El 
where the channel state of the DMBC depends on the en- 
coder's action sequence, which in turn depends on the message 
the sender intends to send. As a result, the channel is one 
whose states are "action-dependent". In |[8l, the availability of 
the states at the encoder is controlled by a probing (action) 
signal, which is subject to a cost constraint. Similar action- 
dependent ideas were studied in |9l, ITOl in the source coding 
context. However, the models studied in these works do not 
incorporate any secrecy constraints. 

The model considered in this paper is a generalization of 
the "source excitation" model of 1241 . The model considered 
in that paper is also motivated by the large body of work 
on "physical-layer" security (see, e.g., l25l . l26l ) where the 
common source of randomness results from the inherent 
randomness of wireless channel. One possibility is to sound 
the channel with a random signal, measuring the observations 
generated at the various receivers (and marginalizing over 
the sounding signal). This "source emulation" strategy is 
considered in l26l . Another approach studied in l24l . ||251 
uses deterministic sounding of a reciprocal wireless channel. 
Key extraction follows by denoising the observations using 
a public message. A large secret key rate can be achieved 
if a deterministic sounding signal (of a good design) is used 
and the conditional randomness generated is exploited. The 
generalization in the current model is that we now exploit 
both random sounding (using a wiretap code) and key genera- 
tion (using conditional randomness). The important difference 
between the models is that in l24l the only randomness was in 
the channel, all other functions were deterministic. In contrast, 
now Alice has a private source of randomness. We regard 
the current model as stepping stone to understanding the 
fundamental limits of two-way randomized channel sounding 
in which secrecy rate is derived from the use of two wiretap 
codes and from the conditional randomness produced. 
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Fig. 1. Our problem setup: Based on her private source of randomness M, 
Alice excites the channel via the sounding signal S"{M). She generates 
a public message <J>(A/, X"), which is transmitted through the noiseless 
public channel and hence known to all parties. Alice and Bob generate keys 
Ka{M, X''^) and Kb{^,Y") respectively. The keys should agree, while at 
the same time, they should be kept secret from Eve. 



B. Main Contributions: Capacity and Error Exponents 

In this paper, we consider a system with the model shown 
in Fig. [T] We can think of the terminal labeled Alice as 
a base station on earth equipped with a sensor. This base 
station transmits a random message M securely to a satellite 
encoder which produces a satellite sequence 5" according 
to some conditional probability law. The satellite sequence 
is the input to a broadcast channel p{x,y,z\s) (the wireless 
medium). The channel produces data X", Y" and Z" that are 
received respectively by the sensor located at the original base 
station, the sensor at the base station labeled Bob as well as an 
malicious sensor labeled Eve. The two legitimate base stations 
would then like to generate a secret key - Alice given {M, X") 
and Bob given ($, Y"), where $ is some public message that 
is known to all parties on terra firma. This generalizes the 
model of |24l in the sense that the input sounding signal can 
be randomly selected by the sender based on her private source 
of randomness and the information of the chosen sounding 
sequence is protected by using wiretap channel coding. This 
allows us to optimize over the distribution of the sounding 
signal to maximize the secret key rate. In this paper, we present 
two main results; a capacity theorem and an error exponent 
theorem for such a system. 

For the capacity theorem, we provide a single-letter expres- 
sion for the secret key capacity of the sender-excited secret 
key agreement model. The capacity-achieving coding scheme 
is one in which the optimal tradeoff between two coding 
strategies has to be found: (i) Treat the DMBC p{y,z\s) as 
a wiretap channel (|2|, fS), |l6l and apply wiretap channel 
coding, and (ii) Treat the channel outputs {X, Y, Z) as excited 
correlated sources and use key generation scheme as in |3l 
to extract the key. We demonstrate this tradeoff by using 
examples in which the channel is degraded in favor of the 
legitimate receiver. 

For the error exponent theorem, we derive exponentially 
decaying rates for two types of "errors": (i) the decoding error 
probability at the legitimate receiver, which corresponds to 
the reliability exponent, and (ii) the key leakage rate to the 
eavesdropper, which corresponds to the secrecy exponent. In 
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the analysis of the reliability exponent, the legitimate receiver 
uses a combination of a maximum likelihood and a maximum 
a-posteriori (ML-MAP) decoder to jointly decode the sender's 
source X" and her private source of randomness (or mes- 
sage) M. The resulting reliability exponent expression can be 
specialized to Gallager's channel coding error exponent ifTTl 
Sec. 5.6] and Gallager's source coding error exponent lfT2ll . 
On the other hand, in the key leakage analysis, the secrecy 
exponent we derive captures the leakage due to Eve's channel 
p{z\s) and the leakage due to the correlation between Alice's 
variable X and Eve's variable Z in a transparent manner Our 
analysis builds on the work by Hayashi in tOI , |27J, where 
he links the leakage rate of a wiretap channel to channel 
resolvability and identification coding ll28l . This connection 
is also examined Bloch and Laneman ||29l where they derive 
the capacity of general wiretap channels from an information 
spectrum perspective 1281 . Our secrecy exponent results, which 
are developed in Section |IV] can be specialized to that for 
the wiretap channel ifTSI . 1271 and that for the secret key 
generation from correlated source setting lT3l . 1241 . Il30l . BTl . 
The difference vis-a-vis ll24l is that the methods to bound the 
exponents for both reliability and secrecy involve both the 
wiretap channel coding and the source coding. This would 
become clear from our discussion in Section |IV] where we 
specialize the result in this paper to various known models. 
Thus the model here is more general than existing ones. Note 
that the criterion for exponential decay is much stronger than 
the usual strong secrecy ID. We focus on this exponential 
notion because it quantifies how fast the error probability and 
information rate decays to zero. 

C. Paper Organization 

This paper is organized as follows: In Section [III we 
describe the system model, i.e., the key generation protocol. 
We also define the secret key capacity, the capacity-reliability- 
secrecy region and the notion of channel degradedness. Our 
main results pertaining to the secret key capacity are provided 
in Section |lll] We also prove a looser upper bound for the 
secret key capacity that does not contain any auxiliary random 
variables, and hence is amenable to evaluation. We show that 
this upper bound is in fact tight for degraded channels. We 
present the reliability and secrecy exponents in Section ITV] and 
make connections to previous works. In Section |V] we present 
several examples to demonstrate how the preceding theorems 
can be applied to channels of interest. We show that there is an 
inherent tradeoff between the secret key rate derived from the 
source- and from the channel-type randomness. We also show 
the inherent tradeoff between the reliability exponent and the 
secrecy exponent. We conclude our discussion by suggesting 
avenues for future research in Section [Vl] The proofs of 
the capacity theorems and the error exponent theorems are 
provided in Section IVlIl and Section IVIIII respective Iv. 

D. Notation 

We generally adopt the notational conventions in the book 
by El Gamal and Kim l32l . some of which we recap here. 
All logarithms are to the base 2. Random variables are in 



upper case (e.g., X) and their realizations in lower case 
(e.g., x). The corresponding alphabets of random variables 
are in calligraphic font (e.g., X) and so are all sets and 



events (e.g.. 



For vectors, X* 



(X, 



,Xi) and if 



j = 1, the abbreviation X' = XI is used. In addition, 
j^n\i A (X'^^,Xj'!^j^). The probability mass function (pmf) 
and probability density function (pdf) of a random variable 
X is denoted as px{x). The function p{x) = px{x) without 
the subscript is understood as the pmf or pdf of the random 
variable X. Random codebooks are denoted by a special script 
font "^i" while a codebook realization is denoted as C. The set 
of nonnegative real numbers is denoted by M+. For an a > 0, 
we also commonly use the notation [1 : 2°-] = {1^ . . . ,2^°-^} . 

II. Problem Setup 
A. The Secret Key Generation Protocol 

The setting is shown in Fig.[T] Consider a 3-receiver DMBC 

{S^p{x,y,z\s),X y. y y. Z) consisting of four finite sets 

S^X^y,Z and a collection of conditional pmfs p{x,y,z\s) 

on X xy X Z. The sender, Alice at terminal X, controls the 

channel input sounding signal s" through the encoder via n 

uses of the channel. Alice has a private source of randomness 

used to select an index m, which influences s". The legitimate 

receiver at terminal y is known as Bob and the eavesdropper 

at terminal Z is known as Eve. There is also a noiseless public 

discussion channel which allows Alice to transmit a message 

$ to Bob and Eve. A (2"-'^^', 2"-''*,n,r) code for the secret 

key generation protocol consists of: 

1) Channel Excitation: Alice first selects a message m e [1 : 

2nflAf ] uniformly at random. Then, the (satellite) encoder 

randomly chooses a message-dependent input sequence 

5" ^ pgn\]^i{s'^\m) such that the random codeword S*" 

to the channel satisfies the condition that the average cost 

is no larger than some F as n — ?► oo, i.e.. 



lim P 

n— >-oo 



n 
n ^ — ^ 



< r 



i=l 



1 



(1) 



where K : S ^ K_|_ is a per-letter cost function and 
r > denotes the cost. We also abuse notation to write 
A"(s") A ^YTi^iKsi)- The constraint in O means 
that the probability that the average cost exceeds F is 
arbitrarily small for n sufficiently large. The sequence 
5" = s" is transmitted over n channel uses. The output 
sequences x", y" and z" are observed by Alice, Bob 
(legitimate receiver) and Eve (eavesdropper) respectively. 

2) (One-Way) Public Discussion: After observing x", Alice 
generates a one-way public message cj) ~ (f){m, x") G [1 : 
2"^*], and transmits it over a noiseless public channel. 

3) Key Generation: Alice generates a key fcA = 
kp^{m^x'^) g N. After receiving his channel output y" 
and the public message 0, Bob generates another key 

Here, we note that it may be plausible that if the encoder 
(described in step 1 above) has an additional source of 
randomness to augment Rm, then we may potentially extract 
a higher secret key rate. We can represent this situation by 
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using a mapping from m to s" via the conditional probability 
law s" ^ p5n|^//(s"|7n') where m' — (m, mo) and ttiq 
is randomness private to the satellite and not known by 
Alice. However, it can be distilled from our converse that, 
in fact, augmenting Rj^j does not help. This follows for two 
reasons. First, we impose no rate constraints on Rm and thus 
Alice could feed to the encoder the random seed mo used 
to randomize the mapping (and perhaps then be forgotten 
by Alice - thereby emulating a separate source of private 
randomness at the satellite). Second, in our converse we do 
not impose that the message M be recovered by Bob reliably. 
(In essence, we do not apply Fano's inequality to M and claim 
that ff(M|y",$) < o{n).) 

Note the conditional distribution of {X, Y, Z) given S can 
be factorized as p{x\s)p{y, z\x, s). The first conditional distri- 
bution p(a;|s) can be roughly thought of as Alice's influence on 
the channel state via the sounding signal s", while the second 
p{y,z\x,s) can be thought of as a state-dependent channel. 
The variables S and X are available at Alice but she can 
only control the sounding signal S, which in turn triggers the 
channel, giving her X. As mentioned in the Introduction, the 
model we study in this paper involves a probing mechanism. 
This is analogous to the models studied in fTl- lfTOl . in which 
the channel (or source) is influenced by a sequence of actions 
but there is no secrecy requirement. The main difference from 
IfTl- lfTOl is that in our model, we consider only one DMBC 
p{x, y, z\s), thus the chosen channel input s" does not depend 
on the observation x". However, the sender Alice uses both 
a;" and s" to generate a key /ca in the subsequent public 
messaging step. 

B. Definitions 

We now provide the definitions of achievable secret key 
rates, secret key capacity and error exponents. As a reminder, 
the random variables A'a and Kb respectively denote Alice's 
and Bob's key. The public message is denoted as $. 

Definition 1 (Weak Achievability). The number i?sK is said 
to be a F-weakly-achievable (or simply T-achievable) secret 
key rate if there exists a sequence of (2"^" , 2"^* , JT-, F) codes 
for the secret key generation protocol such that the following 
three conditions are satisfied: 



lim 

n— >-oo 

lim 



P(Aa ^ A'b 
1 







n 
1 



/(Xa;^",*) = o, 



liminf -H{Ka) >Rsk ■ 



(2) 
(3) 
(4) 



Definition 2 (Secret Key Capacity). The secret key capacity- 
cost function CsK(r) is the supremum of all T -weakly- 
achievable secret key rates Rsk, i-e., 

C'sK(r) = sup{i?gK : Rsk is T -weakly-achievable} . (5) 

We will henceforth say that CsK(r) is the secret key 
capacity (without reference to the cost F). The first condition 
in (|2]), known as the reliability condition implies that we would 
like Alice's and Bob's keys to agree with high probability. 
The second condition in Q, known as the secrecy condition. 



requires that the eavesdropper should not be able to estimate 
the key Kj^ G [1 : 2"^skj given her sequence Z" and the 
public message $. This is manifested in fact that the key 
leakage rate -I{Ka', Z",^) should be arbitrarily small for 
sufficiently large block length n. In other words, we would like 
Ka and (Z", $) to be independent asymptotically in the sense 
that the key leakage I{Ka; Z" , $) — o{n). The rate condition 
in dU implies that the entropy of A'a should be close to i?sK- 
In other words the pmf of A'a should be close to that of a 
uniform pmf supported on [1 : 2"^^'^], so the eavesdropper 
can only glean a negligible amount of information. 

In many practical settings, the fact that the error probability 
in (|2]l and the key leakage rate in (|3]l can be made arbitrarily 
small with increasing block length is insufficient. See Maurer's 
work in [331 and a more recent exposition in 1 29 1 . It would, 
in fact, be desirable to quantify their rates of decay and 
to devise coding schemes to ensure that these rates are as 
large as possible. We formalize this by defining the notion 
of an achievable secret key rate-exponent triple. To simplify 
the exposition, in our definitions (and corresponding results) 
of rates with exponents, we will assume that F = oo. In 
other words, we do not impose a cost constraint on the input 
codewords as in ^. For the sake of brevity, we will call a 
(2"^^^2"^*,n,F = cx)) code a (2"-«^S 2"-"* , n) code. 

Definition 3 (Achievable Secret Key Rate-Exponent Triple). 

The secret key rate-exponent triple {Rsk, E, F) G M'^ is 
achievable if there exists a sequence o/(2"^", 2"^*, n) codes 
for the secret key generation protocol such that in addition 
to (|4|l, the following hold: 

liminf -- log P(Aa t^ A'b) > A , (6) 

n—^oo fi 

liminf log/(A'A; Z", $) > A . (7) 

In (|6]l, E is known as the reliability exponent and in (|7]i, F 
is known as the secrecy exponent. Collectively, E and F are 
known as error exponents (though I{Ka', Z'''\ $) is not, strictly 
speaking, an error probability but we abuse terminology to say 
that both are "errors"). Definition |3] can also be interpreted as 
follows: If a triple {Rsk,E,F) is achievable, then the error 
probability in dill decay^ as P(A:a 7^ ATb) < 2""^ and the 
key leakage decays as I{Ka;Z",^) < 2~"^. Naturafly, the 
constraint on the entropy of the secret key in ^ is retained 
in the above definition. 

Definition 4 (Capacity-Reliability-Secrecy Region). The (se- 
cret key) capacity-reliability-secrecy region TZ C M.^ is the 
closure of the set of achievable secret key rate-exponent triples. 

In analogy to the notion of weak achievability, we can also 
define a more stringent notion known as strong achievability, 
also studied in 



Definition 5 (Strong Achievability). The secret key rate Rsk 
is strongly-achievable ;/ {Rsk,E,F) is achievable for some 
E >Oand F > 0. 

'Here and in the following, for a pair of positive sequences {{a„, 6n)}nsN> 
we say that a„ < b„ if limsup„_>gQ n^' log(an/fe„) < 0. The notation 
> is defined analogously. We say that a„ 



if an < bn and a„ > b„ 
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We conclude our suite of definitions by formalizing the 
notion of degraded channels. 

Definition 6 (Degradedness). 'We say that the DMBC 
p{x,y,z\s) is degraded ;/ {X,S) — Y — Z form a Markov 
chain, i.e., p(y, z|a;,s) = p(ij\x, s)p{z\y). 

In this case, we also say that the DMBC p(a;, y, z|s) is de- 
graded in favor o/Bob or equivalently that Eve's observation is 
a degraded version of Bob's. Note that we do not differentiate 
between physical and stochastic degradedness ll32l Ch. 5]. The 
capacity results will turn out to be identical for both cases. 

III. Capacity Theorems 

We present our capacity theorems in this section. These 
correspond to Definitions [T] and |2] in the previous section. 
We give a single-letter expression for the secret key capacity 
containing three auxiliary random variables. We also provide 
a looser upper bound in which there are no auxiliary random 
variables in the expression. The upper bound is tight when the 
system is such that the channel is degraded in favor of Bob, 
the legitimate receiver (as per Definition |6]l. 

A. Main Capacity Result 

Theorem 1 (Secret Key Capacity). The secret key capacity of 
DMBC {S,p{x,y,z\s),X x y x Z) is 

CsK(r) - max [I{U, V; Y\W) ~ I{U, V; Z\W)] , (8) 

where the maximization is over all joint distributions that 
factor in accordance to 

p{w,u,v,s,x,y,z) 

= p(w, u)p{x, s\u)p{v\w, u, x)p{y, z\x, s) (9) 

such that E[A(S')] < F (also see equivalent Markov condi- 
tions in (I13l l and (114b ). Furthermore the cardinalities of the 
auxiliary random variables W, U and V may be restricted to 



|W| < |5|+7, 
|Z^|<(|5|+5)(|5|+7), 
|V|< 1^-1(151 +5)(|5|+7f 



(10) 

(11) 
(12) 



A Sketch of the Proof: The converse of Theorem [T] is given 
in Section IVII-AI It relies on a simple application of the 
Csiszar-sum-identity and an appropriate identification of the 
auxiliary random variables that satisfy the Markov conditions 
in dH and (Ell. 

The direct part, which involves wiretap channel coding and a 
key generation scheme, is proven in Section IVII-BI Firstly, we 
generate cloud centers [/" and satellite codewords V"-. These 
cloud centers and satellite codewords are assigned random 
bin indices via a double random binning step. See Fig. [Tol 
Secondly, (J7",y") are used to "cover" Alice's observation 
X'"-. That is, the number of codeword pairs (f/",F") is 
sufficiently large such that with high probability there exists 
an (t/", V^") jointly typical with X". Alice then generates the 
secret key and the public message based on the bin indices 
of ([/", y"). The random binning step ensures that the rate 
is reduced and thus Bob can decode the cloud center [/" 



reliably. Part of the rate (normalized logarithm of the size 
of the codebook) is allocated for generation of the secret 
key and the other part for generation of the public message, 
which roughly corresponding to information about the satellite 
V" conditioned on a cloud center t/". Bob can decode 
the satellite codeword reliably with the help of the public 
message. Thirdly, Bob generates his key corresponding to the 
bin indices of ([/",y). The auxiliary random variable W 
essentially plays the role of a time-sharing variable. Finally, 
the key rate is designed to be small enough so that the rate 
of satellite codewords given a public message is larger than 
the capacity of Eve's channel by at least the key rate. This 
guarantees that the information leakage to Eve is small. After 
the statement of Corollary |5] we provide another proof sketch 
for the degraded case in which there are no auxiliary random 
variables. The proof of the cardinaUty bounds is sketched 
in Section IVII-CI These bounds ensure that the secret key 
capacity is, in principle, computable. 

Lemma 2 (Properties of CsK(r)). The function Csk ■ 

(0, oo) — > R+ is non-decreasing, concave and continuous. 

We provide an operational proof of Lemma |2] in Sec- 
tion IVII-DI Note that the joint distribution factorizes as in (|9|l 
if and only if the following Markov chains hold: 

W-U-iS,X)-{Y,Z) , (13) 

V -{W,U,X)~{S,Y,Z) . (14) 

This can be seen via the following set of equalities: 

p{w, u, V, s, X, y, z) = p(u|it;, u, x)p{w, u, s, x, y, z) (15) 

— p(v\w, u, x)p{w, u, s, x)p{y, z\w, u, x, s) (16) 

— p{v\w, u, x)p{w, u)p{x, s\w, u)p{y, z\x, s) (17) 
= p{v\w, u, x)p{w, u)p{x, s\u)p{y, z\x, s), (18) 

where (flSl l is an application of Bayes' Rule and (O, ( fT6] l is 
an application of Bayes' Rule, (fTTI i is an application of Bayes' 
Rule and ( fTST l and finally (fTsT l is by another application of ( fTsT l. 
Equation ( fTsT i is precisely the factorization in (|9]l. 

Furthermore, observe that the rate in (O can be written as 
sum of two rates _Rsk = Rch + ^src where 

i?ch = I{U; Y\W) - I{U- Z\W) , (19) 

i?,,.c - I{V- Y\W, U) ~ I{V- Z\W, U) . (20) 

The first rate Rch can be interpreted as the confidential 
message rate of the wiretap channel p{y^z\s) IS) (the final term 
in (|9]l). The second rate Rs^c is the secret key rate from excited 
correlated source (X, Y, Z) previously studied in ll24l for a 
particular deterministic sounding signal s". This is elucidated 
in the second Markov chain in (fT4l i. Here the sounding signal 
5" is randomly chosen by Alice based on her private source 
of randomness. As such, we can optimize over the distribution 
in (|9]l to find the largest "sum rate" i?ch + ^src- It turns out 
that there is a natural interplay between i?ch and i?sic- We see 
this using an example in Section IV-BI The decomposition of 
i?SK into the sum i?ch + i?src guides the proof of the direct 
part of Theorem [T| 
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As a by-product of the code construction in the proof of 
Theorem [T] we obtain the following more general version of 
the same theorem. 

Theorem 3 (Secret Key Capacity With Rate Constraint 
on Public Message). The secret key capacity of DMBC 
{S,p{x,y, z\s),X X y X Z) is with rate constraint i?$ on 
the public message is lower bounded as 

CsK(i?*,r) >max [I{U,V-Y\W)-I{U,V-Z\W)] , (21) 

where the maximization is over all joint distributions that 
satisfy the Markov conditions in ( 113b and ( I14l l hold and such 
that E[A(S')] < r and 



R^ > I{V; X\U, W) - I{V; Y\U, W) 



(22) 



B. Capacity Results for Degraded Channels 

To find the secret key capacity for specific channels, three 
auxiliary random variables W, U and V solving ([8]l and 
satisfying ( fT3] l and ( fT4l i have to be identified. This may be a 
difficult task. In the next proposition, we provide an (albeit 
looser) upper bound which does not involve any auxiliary 
random variables. This result will turn out to be important 
in Section [V] where we present several channels for which 
we identify the secret key capacities in closed-form. In the 
following, we will assume that there is no rate constraint on 
the public message. 

Proposition 4 (Upper Bound in Secret Key Capacity). The 
secret key capacity is upper bounded as 



CsK(r)<max I{X,S-Y\Z) , 



(23) 



where the maximization is over all input distributions p{s) 
such that E[A(S')] < T. 



The proof of this proposition is given in Section IVII-EI 
Roughly speaking, the expression in (l23T l can be interpreted 
as the secret key capacity when Alice and Bob have full 
knowledge (side information) of Eve's observation Z, hence 
the conditioning on Z. Indeed, in the case of degraded 
p{x,y, z\s), the result in Proposition |4] is tight. 

Corollary 5 (Secret Key Capacity of Degraded Channels). If 
the DMBC p{x, y, z\s) is degraded, the secret key capacity is 

CsK (r) = max [/(X, 5; y ) - /(X, 5; Z)] , (24) 

where the maximization is over all input distributions p{s) 
such that E[A(S')] < T. 

Proof: Let S have distribution p{s) that achieves the up- 
per bound in Proposition |4] The secret key capacity of the 
degraded DMBC can be upper bounded as 

Csk{T)<I{X,S-Y\Z) 

= I{X, S; Y, Z) - I{X, S; Z) 

= I{X,S;Y)-I{X,S;Z) . (25) 

The last equality is due to the fact that {X, S) — Y — Z form 
a Markov chain. On the other hand, the upper bound ( |25] | can 
be achieved by the specific choice of W ^ 0,U = S, and 
F = a: in dH). D 



r.^"'(y) 




Te{Z) 



Fig. 2. Illustration of code construction for the degraded case. The number 
of selected excitation sequences (denoted as solid black dots in the typical set 
of S, namely t}"\S)) is approximately 2""-^'^'l. 



The extra key rate R^h resulting from Alice's private source 
can be understood by the illustration of codebook construction 
in Fig. [2] For a fixed excitation sequence s"(l), the achievable 
key rate is i?src = HX; Y\S) - /(AT; Z\S), cf. El. If Alice 
were to add another sounding sequence, say s"(2), and ran- 
domly (and uniformly) select between s"(l) and s"(2) then, 
with high probability. Bob could differentiate between the two. 
Thus, Alice can continue to add to her ensemble of sounding 
sequences until they can no longer be packed into Bob's 
channel output without overlap between the conditionally 
typical output sets. The normalized log cardinality of the set of 
sounding signals is I{S; Y). But, these must be binned to keep 
Eve in the dark, i.e., to ensure that (Z", $) is asymptotically 
independent of Ka- After binning, the additional contribution 
to the key rate is Rch = 1(8; Y) — I{S; Z), giving overall key 
rate R^^c + Rch = I{X, S] Y) - I{X, S; Z). We provide an 
alternative proof of the capacity of degraded channels via the 
error exponent route in the next section. 



IV. Error Exponent Theorems 

In this section, we present an inner bound for the secret 
key capacity-reliability-secrecy region per Definition ID Our 
general result is then specialized to other known results in 
the literature. Recall from the discussion prior to Definition |3] 
that for the error exponent results, we consider the case when 
there is no cost constraint on the codewords for simplicity (i.e., 
T = oo). 

Note that the decoding error probability P(A'a 7^ A'b) 
is only a function of marginal distribution p{x,y,s) = 
p{s)p{x,y\s) and the key leakage /(A'a;Z'",$) is only a 
function of marginal distribution p{x, z, s) if we employ 
the strategy proposed in this paper, i.e., a random binning 
achievability scheme. We can thus characterize the achievable 
reliability and secrecy exponents separately as functions of 
each marginal distribution. 

Before we present our result, let us begin with a few 
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definitions. Let 



p(i?$ - Rm) -log^ 



^p{s)p{x,y\s)^+f 



y L s,x 



i+p 



(26) 



£'o(p(s),i?*,i?j\/) — ma.x Eoipis),p,R,s>,RM) ■ (27) 

0<p<l 

Similarly, define 

Fo{p{s),a,RsK,R<i>,RM) = 

-a{RsK + R-i>-RM)-log y^^p{x,z,s) 



Pix,z\s) 
p{z) 



(28) 



Fo{p{s),RsK,R'i>,RM)= sup Fo{p{s),a,RsK,R^,RM)- 

0<a<l 

(29) 

We now define a rate-exponent region parameterized by the in- 
put distribution p{s) and the pair of auxiliary rates {R^,Rm)'- 

TZ{p{s),R^,Rm) = [{RsK,E,F) e R^ : 

E<Eo{p{s),R^,Rm) 

F < Fo{p{s), RsK, R<i>,RM)} . (30) 

A. The Inner Bound 

The following theorem provides an inner bound to the 
capacity-reliabihty-secrecy region TZ. 

Theorem 6 (Inner Bound to the Capacity-Reliability-Secrecy 
Region). The union of the regions in (130b is an inner bound 
to the secret key capacity-reliability-secrecy region, i.e.. 



u 



TI{p{s),R^,Rm)<^TI . (31) 

p(s),Ris,,RM 

The proof of this theorem can be found in Section IVIIII 
and hinges on an ML-MAP decoding strategy. More precisely, 
given (y",(/)). Bob first uses the following rule to estimate 
Alice's source of private randomness rh and Alice's received 
sequence £": 

(m,i")= argmax p(y"|s"(m))p(a;"|y", s"(to)) . 

(m,a:"'):0(m,a:"')— 

(32) 
The function 0(to,x") is a (random) binning function, which 
is defined and discussed in greater detail in Section IVIII-AI 
Upon the decoding of (to,x"). Bob declares his key to be 
k-Q = k{rh,x'^), where k{m,x") is another (random) binning 
function. The proof for the secrecy exponent leverages on the 
properties of the Renyi entropy as in lfT3l . Il24l . 

The union of the regions in dSTT i is likely to be a strict inner 
bound since our coding scheme does not involve the use of any 
auxiliary random variables (unlike in Theorem ^. However, 
as we shall see in Section IIV-CI our analysis of the ML-MAP 
strategy shows that all weakly-achievable rates Rsk < C'sk 
are strongly achievable for degraded channels. This is, in part, 
due to the fact that the capacity expression for the degraded 
case does not contain any auxiliary random variables. 

Another reason as to why the error exponent region is likely 
not tight in general may be distilled from the communication 



for omniscience (CFO) works by Csiszar and Narayan ifTTl : 
later extended by Gohari and Anantharam ||35l |. Consider the 
general version of the CFO work in ll35l . adapted to our 
problem. Without loss of generaUty we can assume that an 
external agent can become omniscient about X" (i.e., know 
X" perfectly) after receiving Eve's information (Z",$) and 
the shared secret key Ka- If this were not the case, there 
would be some piece of information about X", independent 
of {Z",^,Ka), that the external agent would require to 
know X" perfectly. In such a setting, Alice could reveal the 
needed information on the public channel without lowering the 
secret key rate. This following since what would be revealed 
would be is independent of A'a, and thus of no use to Eve. 
Thus, without loss of generality, we assume external agent 
omniscience. 

Now, say that Z is a degraded version of Y. In this 
setting Bob can simulate Z". Bob also has {^,Kb) (note that 
A'b = Ka with high probability). So, Bob too can be assumed 
to be omniscient about X". In other words, in the degraded 
setting there is no loss in generality in requiring Bob to recover 
X". However, when there is a non-trivial joint distribution 
amongst X,Y and Z (i.e., the non-degraded case), it is not 
necessarily true that Bob can recover X". Hence the error- 
exponent strategy may be strictly suboptimal. This observation 
is consistent with the "separation" strategy elucidated in (fT9l l 
and ( l20l i as the separation strategy - which is optimal in the 
degraded case - in effect implies that Bob can decode X" as 
discussed above. 

B. Positivity of Error Exponents and Interpretations 

For a particular choice of input distribution p{s), the follow- 
ing proposition characterizes the boundary of the achievable 
rate-exponent region in (|30] |. 

Proposition 7 (Positivity of Error Exponents). For a fixed 
input distribution p{s), the exponent Eo(j){s), R^ , Rm ) in (I27l l 
is positive if 

R^ - Rm > H{X\Y, S) - I{S; Y) . (33) 

Similarly, the exponent Fo(p{s), Rsk, R^, Rm) in G9\ is 
positive if 

Rsk + R'S.-Rm <H{X\Z,S)^I{S;Z) . (34) 

The proposition can be proved by firstly verifying that Ec, 
(resp. Fo) is a concave function of p (resp. a), secondly by 
computing the partial derivative of Eo (resp. Fo) with respect 
to p (resp. a), and finally by evaluating the slope at p = 
(resp. a = 0). This is a standard calculation and as such, we 
omit the details. See [24* Theorem 3] and its accompanying 
remarks for similar calculations. 

The rate condition in (l33T l for the reliability exponent to 
be positive may be rewritten in two equivalent ways which 
provide somewhat more intuition: 

Rm <I{S;Y) + {R^-H{X\Y,S)) , (35a) 

R^> H{X\Y,S)-{I{S;Y)~Rm) . (35b) 



Using (I35t , we now illustrate that there is a fundamental in- 
terplay between source- and channel-type randomness. We see 
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from ( |35a| i that if i?$ > H{X\Y, S) (i.e., the compression rate 
is strictly larger than the Slepian-Wolf rate H{X\Y, S) 1361 ). 
we may transmit the message M reliably at rates higher than 
I{S; Y), which is the maximum transmission rate when the 
input distribution p{s) is used for the channel p{y\s). In 
addition, observe from ( I35bl i that if Rm < I{S; Y) (i.e., 
the message has rate strictly smaller than channel capacity 
raaxp(^g-j I{S;Y)), the public message rate can be strictly 
smaller than H{X\Y,S). In other words, we may compress 
the source X reliably given side information {Y, S) to a rate 
smaller than H{X\Y, S) if Rm < I{S; Y). 

The rate condition in (|34| | for the secrecy exponent to be 
positive may also be rewritten in the following two equivalent 
forms: 

RsK + i?* < H{X\Z, S) + {Rm - I{S; Z)) , (36a) 

Rm > I{S;Z)-{H{X\Z,S)-{RsK + R.p))- (36b) 

The authors in 124] Theorem 3] showed that the secrecy 
exponent is positive when i?sK + ^$ < H{X\Z, S). However, 
we observe from ( |36at that if Rm > I{S; Z) (i.e., the 
message rate is larger than what Eve can resolve with her 
channel p{z\s)), the secrecy exponent is positive even though 
^SK + ^* may be larger than H{X\Z,S). Similarly, observe 
from (I36bb that if i?sK + i?* < H{X\Z, S), then Rm may be 
smaller than I{S\ Z) for the secrecy exponent to be positive. 

C. Strong Achievability and Connections to Degradedness 

If we eliminate the rates R^ and Ri\j in ( |33] | and (|34] |. we 
conclude that Rsk is strongly-achievable (Definition |5]l if 



TABLE I 

SPECIALIZATION OF PROPOSITIOn[2]tO EXISTING RESULTS 



RsK<H{X\Z,S) 

= I{X;Y\S)- 
= I{X,S;Y)- 



-IiS;Z)-{H{X\Y,S)-I{S;Y)) 
I{X;Z\S)-I{S;Z)+I{S;Y) 
I{X,S-Z). (37) 



If, in addition, we assume that {X, S) — Y — Z form a Markov 
chain (i.e., the degraded condition in Definition |6]l, then (|37] | 
is equivalent to 

i?SK < I{X, S; y, Z) - I{X, S: Z) = I{X, S; Y\Z) . (38) 



From (IJTI i and (|38] |, we see that the secret key capacity 
for degraded channels (Corollary |5]l has been recovered. 
Interestingly, this alternative method of deriving the secret 
key capacity for the degraded case via the error exponent 
route demonstrates that for degraded channels, the weak and 
strong definitions for achievability (in Definitions [T] and |5] 
respectively) coincide. 

D. Connections to Previous Results 



The reliability exponent in (|26] | is akin to a combination of 
Gallager's exponents for channel coding in ifTTl Sec. 5.6] and 
for source coding with side information in lfT2l . The secrecy 
exponent has been studied for the secret key agreement source 
model ifTJl . Il30l . the corresponding channel model lfT3l . and 
the source model with external deterministic excitation ll24l . 
Hayashi 1131 . IZTl also analyzed the exponential decay of 
the information leakage rate for the wiretap channel. The 
expression in ( l28T l is akin to a combination of the key leakage 





Reliability Exponent Eo 


Secrecy Exponent Fo 


X = 
/?$ =0 


Channel coding 
[TT] Theorem 5.6.2] 


Wiretap channel 
coding [O] Theorem 3] 


5 = 
Rm=0 


Source coding with 
side information 1121. 1241 


Secret key generation with 
pubUc discussion |13I. 1241 



rate due to Eve's DMC p{z\s) fOl and the secrecy exponent 
of the excited DMMS pix,z\s) 1241 . 

In light of these observations, Proposition|7]may be special- 
ized to derive conditions for the positivity of the exponents 
for the pure channel-type and the pure source-type models as 
follows: 

I) X — and i?$ = 0: This case specializes to the wiretap 
channel p{y,z\s). In this case, the reliability exponent 
Eo{p{s), 0, Rm) reduces to that for channel coding over a 
discrete memoryless channel (DMC) ITTl Theorem 5.6.2] 
and (l33T l results in the condition 



Rm <IiS;Y) 



(39) 



which we recognize as the condition for reliable commu- 
nication over the DMC p{y\s). 

In addition, our secrecy exponent Fo{p{s), Rsk,0, Rm) 
reduces to Hayashi's wiretap secrecy exponent in lT3l 
Eq. (14)] and ( l37b simplifies to the confidential message 
rate constraint 



RsK<I{S;Y)-I{S:Z) 



(40) 



which we recognize as the condition for reliable commu- 
nication and secrecy for the wiretap channel. Note that 
the usual auxiliary random variable "C/" l32l Theorem 
22.1] has been taken to be equal to the source S in ( l40t . 
2) 5 = and Rm = 0: This case specializes to the 
secret key generation model with public discussion char- 
acterized by the DMMS p{x, y, z) = J2s P{^)p{^^ V^ A^) 
studied by Maurer H), Csiszar-Narayan HT), fTS) and 
Gohari-Ananthram ll35l . l37l . The reliability exponent 
Eo{p{s), i?$, 0) was characterized in llT2l and was stated 
as a special case of the main result in ll24l . Our result in 
simplifies to 



i?$ > H{X\Y) 



(41) 



which we recognize as the condition for lossless source 
coding of X given side information Y l36l . This recovers 
an analogue of the result in ll24l Theorem 3]. 
We note here that Watanabe et al. l3Tl showed that 
strongly secure privacy amplification (strong secrecy) is 
not achievable by Slepian-Wolf coding. But this does not 
contradict our error exponent and strong secrecy result 
because the codes used in ||3T1 have rates tending to the 
optimal compression rate H{X\Y) in dTTT i at a rate of 
b/y/n. for some 6 > 0. This constant b is related to the 
second-order coding rate or dispersion of Slepian-Wolf 
coding l38l . However, we operate at rates strictly above 
H{X\Y) in ( l4n i so strong secrecy is indeed possible. 
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P/o'' (dB) 

Fig. 3. Secret key capacity of tlie Gaussian additive interference channel. 
pi2 = 0.8, P13 = 0.3. 1/1=1/2 = 1, !/3 = 2. o-i = 1 for all i G {1, 2, 3}. 



The secrecy exponent Fo(p(s), i?sK, -R*,0) was derived 
in IIT3I . II24I . II30I . Our secrecy exponent result in (|34] | 
specializes in this case to 



-RsK + i?* <H{X\Z) 



(42) 



which recovers an analogue of the main result in Chou 
et al. El Theorem 3]. 
The specializations are summarized in Table I] 

V. Numerical Examples 

We consider three examples in this section. The first two 
illustrate the tradeoffs involved in the capacity results in 
Section |lll] They include an additive Gaussian interference 
channel where the interference of Alice and Bob is correlated 
and binary on-off channel in which Eve receives a degraded 
version of Bob's (or Alice's) output. The final example illus- 
trates the tradeoffs in the achievable error exponent results in 
Section |IV] 

A. Capacity of the Additive Gaussian Interference Channel 

In this section, we consider a Gaussian interference channel 
model. While our previous analysis was for the discrete case, 
it can be shown through standard arguments via discretization 
that the capacity expressions hold for the Gaussian case. See El 
Gamal and Kim fi7\ who attributed this technique to McEliece 
Now, consider the channel model 

X = S + h+Ni 
Y = S + h+N2 

Z = S + l3 + N3 

where Ni,i = 1,2,3, are independent Gaussian noises dis- 
tributed as M (0,af). The random variables Ii,i = 1,2,3 
are distributed as Af (O, lyf) and model interference at each 
receiver. The interferences li are independent of iV,. It is 
assumed that Ii,l2,l3 are jointly Gaussian with E[lilj] = 



PijViVj where pij E (—1,1) is the correlation coefficient. It is 
further assumed that the channel is degraded in favor of Bob 
in the sense that i^| + ct| > i/| + (t| . The input sequence S*" 
is subject to an average power constraint P, that is, the cost 
function is A(s) = s^ and the cost is F = P. 

By degradedness, we can define a new random variable 
Z' = Y + N^ where A^3 is independent and distributed as 



AA(0,i 



aj). Note that {X, S)-Y-Z' form a 



Markov chain and Z' has the same marginal distribution as Z. 
Since ^ only depends on the marginal distribution p{x, z\s), 
from Corollary |5] the secret key capacity is 

CsK = max I{X, S; Y) - I{X, S; Z) = i?ch + i?src ■ 

p(s):E.[S^]<P 

Note that in this case, R^rc is not a function of the input dis- 
tribution p{s). The optimal input distribution is S ^ N (0, P), 
which is same as that for the Gaussian wiretap channel BOl . 
Define Co(SNR) = | log(l + SNR) as the AWGN channel 
capacity with signal-to-noise ratio SNR and Ci(p, i/^ , (T^ ) as 



Cl{p,Uij,(Tij) ^ Ci 



? 9 9 
p'vfvj 



K2+a2)(l/2+^2)_^2j,2 2 



where the parameters i/^ and aij are defined as Vij = (m;, vj) 
and (Ty = {(Ti.aj). With these definitions, i?ch and Rsrc can 
be calculated as 



i?ch = /(5; r) - /(^; Z) 
P 



Co 



4 + ^l 



Co 



p 



R,,, = I{X;Y\S)-I{X-Z\S) 

= Ci(pi2, «/'l2, C12) — Ci(pi3, 1^13, 0-13) . 

Note that, in this case, R^^c depends on the correlation between 
interferences 1 1,1 = 1,2,3 and is not a function of P. When 
we increase the input power P, only i?ch increases. 

The secret key capacity for the specific choice of parameters 
P12 ~ 0.8, /9i3 = 0.3, 1^1 = z^2 = 1, V3 = 2, and 0-^ = 1 for all 
i G {1,2,3} is plotted against the input power P in Fig. [3] As 
Lemma |2] suggests, Csk(^) is non-decreasing and concavqj 
in P. When the allowed input power P is small, extracting 
common secret information from correlated interference is 
important, evidenced by the fact that i?src > ^ch at low power 
levels P. On the other hand, when the input power is large, 
we can simply use the wiretap channel p{y,z\s) without any 
significant loss of rate. 

B. Capacity of the Binary On-off Channel 

In our second example, we consider the binary on-off model 

X = H-S®Ni 

Y = H-SeN2 

Z= {H-H)-S(BN3 , 

where all the variables are binary and where the operations 
are performed in the field of size 2. Hence, the addition 

-The concavity would be more apparent if the horizontal axis of Fig. [5] is 
linear but we find that it is more convenient to plot it in dB. 
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Fig. 4. Secret key rate of the binary on-off cliarmel as a function of /3. The 
input S ~ Bern (/9). The parameters are q = 0.5, q = 0.8, <5 = 0.1, 5^ = 
0.2. Note that Csk = ^^^^P£[o,l] ^Sk(/9) and the maximizing /3* x 0.59. 



above is is binary modulo-2 addition. The "channel gain" H 
is Bern (q) and H is Bern (q). Noise A'^^ is Bern {6i) and the 
Ni are mutually independent. The channel describes a model 
in which, in the absence of noise. Eve's observation is strictly 
worse than that of Alice's and Bob's since H is present. 

If Si = 62 = S and qS < 63, then Eve's channel output 
is a degraded version of Bob's. In this case, there exists a 
Z' = H' -Y S) N^ for some H', with the same distribution as 
H, and independent N^^ - Bern {S'^) such that (X, S)-Y-Z', 
where 

r, ^3 - qS 

Let S ^ Bern (/?). The first term of i?ch is 

IiS;Y)^H{Y)-H{Y\S) 

= H^i/3q * 6) - [PHiY\S =!) + (!- I3)H{Y\S = 0)] 
= H^iPq * 5) - (3H^{q * ^) - (1 - /3)i/b(<5) , 

where i?b() is the binary entropy function and the operation 
a*b = a{l — 6) + (1 — a)b. Similarly, the second term of i?ch 
can be expressed as 

I{S; Z) = Hy,{(5qq * S3) - PH^{qq * S3) - (1 - /?)7?b(<53) • 

The secret key rate due to source X can be calculated as 

Rsrc^I{X;Y\S)-I{X;Z\S) 
^P[I{X;Y\S = 1)-I{X;Z\S=1)] 
= PiHhiq * S) - H^iS * S) - H^iqq * S3) 

+ {l-q* (5)i7b(<53) + (9 * ^)Hb{q * <5^)] . 

The second equality is because if 5* = 0, the source is not 
observed and so there is no mutual information between X 
and Y (and also between X and Z). 

The secret key rate when the input is a Bern (/3) source 
is -Rsk(/3) = -Rch(/3) + Rsrc{f3) and is shown in Figure |4] as 
a function of /3 for the set of parameters q ~ 0.5, q ~ 0.8, 



Fig. 5. Plot of the random coding reliability exponent Er in 143 1 



5 = 0.1,63 = 0.2. Note that i?ch is a concave function of 
/3 and i?s,c is a linear function of f3. Also, if /3 = 0, then 
Rsic = in this example. In contrast, i?s,c is positive at all 
powers for the additive Gaussian interference channel. When 
13 = 1 (5" is the all ones sequence), the input excites all 
common randomness due to the common on-off coefficient 
H. However, when 13 = 1, the secrecy rate of the wiretap 
channel R^h = 0. Thus, we observe that there is a inherent 
tradeoff in between the amount of common randomness and 
the wiretap secrecy rate. 

C. Error Exponents 

We now illustrate our error exponent results. We assume 
that X = y = Z = S = {0,1}, i.e., all the variables 
are binary-valued. We selected the parameters of the DMBC 
p{x, y,z\s) randomly but ensured that Eve's observation Z is 
a degraded version of Bob's Y. We did so by first selecting 
the parameters of the conditional distribution p{x,y\s), then 
we proceeded to choose the parameters in the conditional 
distribution p{z\y). We keep the channel p{x,y,z\s) fixed 
throughout this subsection. Define the input distribution- 
optimized reliability exponent 

E,{R^,Rm) = ui&^E,[p{s),R^,Rm) , (43) 

p{s) 

where Eo was defined in ( l27b . Also define the input 
distribution-optimized secrecy exponent: 



Fi{RsK, R<i>,RM) = niaxi^o(p(s), i?SK, R<i>,RM) 

p{s) 



(44) 



where Fq was defined in ( |29l ). Note that for a particular set 
of rates (i?sK, R-p, Rm), the optimal input distributions p*{s) 
in (03]) and (|44] | may not be identical. We append the subscript 
r to Ei{R<j,, Rm) and Fi.(i?sK, R<i>, Rm) to allude to the fact 
that in the derivation of these exponents, we use both random 
coding mil and random binning schemes ||12| . 

The functions £'r(i?$, Rm) and Fr{RsK, i?$, i?j\/) are plot- 
ted in Figs. |5] and |6] respectively. From Fig. |5] we observe 
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Fig. 6. Plot of the random coding secrecy exponent F^ in j44t 
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Fig. 7. Plot of the reliability exponent Eo and secrecy exponent Fo as a 
function of the input distribution parameter 13. For this plot, p{s) = Bcrn(/3), 
RSK = 0.01, R* = 0.75 and Rm = 0.2. 



that i?$ i-^. Er{R<s,,RM) is a non-decreasing function. This 
is intuitive because given more information (i.e., when i?$ is 
large). Bob can decode the key Kb with greater rehability. In 
contrast, Rm h^ Ei{R^, Rm) is a non-increasing function. 
This is also intuitive because Alice's private source of ran- 
domness is increased if R]\f is increased so Bob's ability to 
decode the key is reduced. 

From Fig. |6] we observe that i?$ i-s- F,-{Rsk,R'S>,Rm) 
is a non-increasing function. This is because as more public 
information is made available to Bob, the key leakage rate 
increases, resulting in a smaller secrecy exponent. The function 
Rm i— >■ Fj:{Rsk, Ris>, Rm) is non-decreasing because if Alice 
has a source of more private randomness, she can conceal more 
of the key from Eve. Finally, Rsk '-^ Fy{RsK, R'S>, Rm) is 
non-increasing because Rsk can be interpreted as the residual 
source of secrecy that can be generated by Alice and Bob 
while keeping Eve ignorant of it the generated key. 



0- 
0.7 



F„(P(s), R,„ R,, RJ 



Fig. 8. Plot of the reliability exponent Eo in j27t and secrecy exponent Fo 
in (29) for a fixed input distribution p{s) = Bern (0.5) with Rsk = 0.01 



Because the optimal input distributions in ( |43] | and ( |44| ) 
may be different, in Fig. |7j we plot Eo{p{s),R<i,,Rm) and 
Fo{p{s), Rsk, R'S>, Rm) as functions of f3, the binary source 
parameter, i.e., p{s) = Bern (/?). The three rates i?sK7 R<i> and 
Rm are kept fixed (see caption). Note that Fo can be shown 
to be convex in (3 but Eo is neither convex nor concave in 
(3. We see that as in the binary on-off channel (cf. Fig. |4| 
in the previous subsection, there is a tradeoff. However, in 
Fig. m the tradeoff pertained to the confidential message rate 
of the wiretap channel i?ch and the secret key rate (common 
randomness) Rsrc- In Fig.|2l the tradeoff is instead with respect 
to the reliability and secrecy exponents. In this example, when 
P is small, Fo dominates (i.e., the secrecy exponent is small). 
On the other hand, when /3 is large, Eo dominates (i.e., the 
reliability exponent is small). The optimal distributions for Eo 
and Fo are Bern (0.15) and Bcrn(l) respectively. 

In Fig. [8] we plot Eo{p{s),R,s>,Rm) and Fo{p{s),Rsk = 
0.01, R<i,,Rm) for a fixed and common p{s). We take p{s) = 
Bern (0.5), which, as discussed in the previous paragraph, 
is not the optimizing input distribution used for the cor- 
responding plots in Figs. |5] and |6] Note that the function 
Eo{p{s),R<s>,Rm) is positive if the rate condition in (l33T l 
is satisfied. Correspondingly, the function Fo{p{s),Rsk ~ 
0.01, i?$,i?M) is positive if the rate condition in ( l34l i is 
satisfied. Observe that there is a non-empty region in the 
(i?$, i?j\/) plane such that both Eo and Fo are positive. Thus, 
^SK = 0.01 is a strongly-achievable secret key rate since 
there exists a (p(s), i?$,i?7\/) triple such that Eo > and 
Fo > 0. Finally, for the purpose of clarity, in Fig. |9] we 
also show the R^j = 0.02 slice of the 3-dimensional plot in 
Fig. [8] We see from both Figs. [8] and |9] that there is a tradeoff 
between the reliability exponent Fo and the secrecy exponent 
Fq. Whenever Fo is large, Fo is small and vice versa. 
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Fig. 9. Plot of the reliability exponent Eo and secrecy exponent Fo for a fixed 
input distribution p{s) = Bern (0.5) with Rsk = 0.01 and Rm = 0.02 



VI. Extensions 

In this paper, we derived capacity and error exponent results 
for the secret key agreement problem when the sender excites 
the broadcast channel p{x,y,z\s) via a source p{s). Several 
questions arise naturally from this work; 

• Is the result in Corollary |3] tight? In other words, can 
an upper bound on the secret key capacity with rate 
constraint on the public message be found and does the 
upper bound match that in jTHl 

• As we mentioned in a remark following Theorem |6] the 
inner bound of the capacity -re liability-secrecy region may 
not be tight. In light of this, one can study how tight 
the reliability and secrecy exponents are. Can a sphere- 
packing-like outer bound be derived, perhaps using types- 
based arguments? 

• What changes if we allow Alice and Bob to communicate 
over multiple rounds, i.e, multi-way public discussion is 
permitted as in [351? Is such a setting strictly better than 
the one-way public message discussion as analyzed in 
this paper? 

• Can upper and/or lower bounds for the secret key ca- 
pacity when the transition probability is state-dependent 
be derived - that is when the channel is a function 
of an underlying state sq so the transition probability 
is p{x,y, z\s,sq)7 The state sq may be assumed to be 
known at the encoder (Alice). 



VII. Proofs of Results in SectionHTTI 

A. Proof of Converse of Theorem |7] 

We start with a lemma |3] Lemma 4.1], which is a conse- 
quence of the Csiszar sum identity 1321 Ch. 2]. 



variables K,^,Y'\Z'' : 

/(i^;r"|$)-/(A';Z"i$) 

n 

= ^ I{K; Y,\Y'-\ ZIVi, $) - I{K; Z,\Y^~\ Z'^^,,'^) 



4=1 



The converse follows from the following steps: 

(b) 

< I{Ka; r", $) - I{Ka; Z", $) + 2ne„ 
= I{Ka; r"|$) - I{Ka; Z"|$) + 2ne„ 

n 

•i=i 

-IiKA;Z,\Y'-\Z^_^„<^) + 2ne, 



(45) 



where (a) is due to Fano's inequality (e„ — )■ as n — > oo), (b) 
is due to the secrecy condition (|3]l, (c) by applying Lemma |8] 
Now we make the following identifications of the auxiliary 
random variables 

w,^{Y^-\zr+„<^), 

U,^{Ka,W,), 

V, ^ Ka. (46) 

As can be readily verified, the chosen variables Wi,Ui,Vi 
satisfy the Markov conditions 

W,-U,-{S,,X,)-{Y,,Z,) 
V^^{W^,U^,X,)-{S,,Y,,Z,) 

as required by ( fT3T l and (fl4l i. Note that since Ka and $ 
(random variables contained in our identifications in Wi and 
Ui in ( |46] |) are both functions of (M, X") (see problem setting 
in Section HHi. Si by itself does not separate {Xi, Yi, Zi) from 
Wi and Ui. However, the separation does hold when {Si,Xi) 
are grouped together by the discrete memoryless nature of the 
channel. Substituting the choice of auxiliary random variables 
in ( |46] | into ( |45] | yields, 

n 

nRsK < J2 ^(-^a; Y,\W,) - I{Ka; Z,\W,) + 2ne„ 

i=l 
n 

= Y, I{Ka, W,-Y,\W,) - I{Ka, W^; Z,\W,) + 2ne„ 

i=l 

n 

- Y, I{U,,Vf, r,|W,) - /([/„ Vf, Z,\W,) + 2ne„ . 

4=1 

Now, in light of the fact that the auxiliary random variables in 
(|46] | satisfy the Markov conditions, we can upper bound each 
term in the above sum by the secret key capacity evaluated at 
the i-th sample. Finally, by using the definition and concavity 
of CsK(r) (see Lemma |2]i, we have 

n 

i?SK<-VCsK(E[A(50])+2e„ 

71 ^ — ' 

4=1 



Lemma 8. The following equality holds for arbitrary random 



<CsK(r) + 2e„ 



(47) 
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This completes the proof of the converse. 



B. Proof of Achievability of Theorem\l\ 



□ (m,0 (1,1:20) (2,1:20) 



(5, 1 : 20) 



1 : 20 


21 : 40 




81 : 100 



1) Typicality Conventions: We will generally use the notion 
of typicality in ll32l . The typical set (or more precisely the 
£-typical set) is denoted as Ti {X). More precisely, for a 
random variable X with distribution (pmf) p{x) = px{x): 

7-H(X) A {^n . |7r(a;;a;") -p(a;)| < £p(a;),Va; e X} , 

(48) 
where tt{x; x") is the relative frequency of the symbol x G X 
in the sequence x" (or the type or empirical distribution). We 



-(«)- 



(") 



will usually abbreviate Te (X) to Te when the random 
variable is clear from the context. The e-conditional typical 
set of y |X given a sequence x" is denoted as % {X, F|x"). 
This is the set of all y" e y" for which (x",y") e 
TJ'''\x,Y). Note that ^^"^(X, F|a;") is an empty set if 
x"^re^")(X). 

For reasons that will be made clear in the following, we 
adopt the Delta-convention 141 1 for the typicality threshold e 
in (l48T l and for all other typicality thresholds in the following 
proof. That is, we assume that all e's are in fact sequences that 
satisfy e — > and ne^ — > od as n — )■ oo. This ensures that 
P'^(7'e {X)) — > 1 as n — > oo (by Chebyshev's inequality). 

Fix e > e' > e" > and the distributions p{u,w), 
p{x,s\u), p{v\w,u,x) achieving CsKij^pr) in (HJ. By 
marginalization and Bayes' rule, this choice of distributions 
induces p{u, w), p{s\u) and p{v\u, w). We first prove achiev- 
ability for W = 0. At the end, we generalize the result. 

2) Codebook generation: Define the five index sets: 

£ = [1 : 2"(^(^'^l'^)+'')] 

<? = [1 : 2"(^(^'^l'^)--f(^''^l'^)+2'5)] (49) 

a: = [1 : 2"(^(^''^''*')--f('^'"^;^))] 

J ^[l: 2»(^(^;^l'^)+^('^:^)-'5)] . 

The (5 parameter is also such that it satisfies the Delta- 
convention |4li. Note that \J\ = |7V{||£|. The set /C represents 
the alphabet of Alice's and Bob's key. The secret key rate is 
i log |/C| [compare to (O with W ^ 0]. 

Randomly and independently generate 2"'^''^'^''~^'^' se- 
quence u"{m),m G Ai drawn according to n"=iP('"j)- For 
each m G Ai, randomly and conditionally independently 
generate 2"(-^(^'"^l^)+'') sequences v'^{m,l),l g C according 
to nr=iP(^*l'"i('™))- Respectively, we refer to the u"(-) and 
u"(-, •) vectors as cloud centers and satellite codewords. 

Let /:A^x£— >-^x/Cbea deterministic binning function 
defined as follows. First, let /i : Aix C ^- J^ he the functioqj 
defined as fi{m,l) = {m — l)\C\ + I. This function indexes 
the satellite codewords. Note that while the binning function 
is deterministic the codewords are randomly generated. 

^Similai' to Matlab's (:) notation, the raster scan /i "unwraps" the pair of 
indices (m,l). Equivalently, the function /i may be defined as ft(m,l) = 
(I- 1)\M\ +m. 



h 



= 10 


91 :95 


96 : 100 








fc = 2 


11 : 15 


16 : 20 


fc= 1 


1 : 5 


6 : 10 



Fig. 10. Illustration of the functions /i and /2 for \M\ = 5, \C\ = 20, 
|i7| = 100, |#| = 2 and \K,\ = 10. In the top figure, the function /i unwraps 
the \M\ X \C\ = \J\ = 100 indices into a single row. In the bottom figure, 
the function /2 partitions the 100 indices into a two-dimensional grid of bins 
where each bin contains T = |i7|/(|^||A^|) = 5 indices. 



Second, let /2 : J ^ <? x /C be a function that in- 
duces a partition of the indices in J (i.e., of the satel- 
lite codewords) such that each sub-bin, doubly-indexed by 
(0, fc), contains an equal number of (m, /) pairs, namely 
T = \J\I[\<1>\\K\) = 2<i{v,V;Z)-?,i)^ Precisely, we set (/> € ^ 
and fc e /C as follows: /2(j) = (<?i, fc) if fc = ri/l/CI] 
and (/) = [[((j - 1) mod |/C|) + l]/r]. Then, we define the 
composite function /(m, /) = f2{fi{m,l)). See Fig. [Tol for 
an illustration of the function /. 

To understand the / function better, consider when 
f{m,l) = (0, fc)- We define the two (projection) maps 
(/)(m, I) ^ (p ^nd k{m, I) = k. These values index the first bin 
(public message) and second bin (key) respectively. Define 

S(0) ^ {(u"(m),z;"(m,0) : ^K /) = 0} 
g{k) = {(ii"(m),u"(TO,0) : k{m,l) = k} , 

to be the set of pairs of sequences with first bin index equal to 
(j) and the set of pairs of sequences with second bin index equal 
to k respectively. Note that the number of pairs of sequences in 
S((/)) is 1^(0)1 = \J\/\<P\ ^ 2"(^(^'^''^)-35)^ n ^.gjj similarly 
be verified that \g{k)\ = 2"(^(^-^l^)-^(^-^l^)+^(^'^'^)-5). 

To give the reader some sense of the necessity of this double 
binning construction, we foreshadow the use of this construc- 
tion in the proof. The cardinality of the public message bin 
|,B(0)| is the number of satellite codewords (and associated 
cloud centers) that Bob can reliably distinguish given his 
observation F" and knowledge of the public message 0. From 
them he can recover the key index k. As \JC\ is, in general, 
exponentially smaller than |i3((?!))|, many satellite codewords 
in B{(f>) correspond to each key index. The choice of |/C| is 
made so that Eve's observation will be jointly typical with 
some (exponentially large) number of satellite codewords that 
correspond to each possible value of k, thereby obscuring the 
correct value of k. 

While similar design considerations to those discussed 
above arise in secret key generation and wiretap channels, 
the probing mechanism requires the use of a hierarchically 
structured code with cloud centers (for probing) and satellite 
codewords (for key reconciliation). Note that more than one 
value of k corresponds to each cloud center u^{m). Recall 
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that the choice of cloud center leads to the probing mechanism 
which, cf. ( fT9] l, leads to the wiretap potion of the secrecy rate. 
On the other hand, the choice of public message depends on 
the index of the satelhte codeword which, cf. ( l20b . leads to 
the key generation portion of the secrecy rate. Interpreting this 
in terms of Fig. [TOl the choice of cloud center narrows the set 
of possible rows from all |/C| possibilities to some subset. The 
amount of narrowing is the wiretap portion of the secrecy rate. 
The further choice of row within that subset that specifies the 
correct value of k is the key generation portion of the secrecy 
rate. 

3) Encoding: Alice selects M £ M uniformly at random, 
sets C/" = U'^{M) and generates channel inputs S*" according 
to Y\^^iP{si\Ui{M)). (Note that we assume the codebook 
is randomly generated so both codewords C/"() and M are 
random.) If (C/"(Af), S*") ^ T^'^\ then declare an error. Let 
the event that either the codewords are atypical or that the cost 
constraint ([T]) is violated be 



Eq ^ {(t/"(Af ), ^") i r^':'^} u {A"(^") > r} 



(50) 



his key as k-g, ~ k{M,L), where the function k is defined in 
the codebook generation step. 

5) Error probability analysis: The error probability can be 
decomposed as 



After observing the channel output AT", Alice finds an I G C 
such that {V"{M,l),U''{M),X'') e r},?\ If no such index 
is found, declare I ~ 1. If there is more than one such index 
I, pick one uniformly at random. Set V" = V"{L,M) where 
L is the index identified by Alice. Define 

£-1 A {(f"(m,/),[/"(a/),a:") ^ t},?\ yiec} (5i) 

to be the encoding error event, where M is Alice's chosen 
message. Alice generates the public message as = 4'{M, L) 
and sets her key according to k^ = k{M,L), where the 
(deterministic) functions and k are defined in the codebook 
generation step. 

4) Decoding: Bob receives F" from the channel output 
and finds a unique m <E M such that (C/"(TO),y") € 7^ " ■ 
If more than one such m e TM is found, declare rh to be 
the smallest such index. Define M to be this estimate and set 
jjn A u'n-i^M). The error events are 

E2^{{U'\M),Y'')iT^r^} , (52) 

where, per above, M corresponds to the m used in encoding 
and 

fa - {3 TO G 7W : m ^ M, (t/"(m), F") G TJ"^} (53) 

is the first decoding error. 

After receiving from the public channel. Bob also 
finds an index / e C such that V{M,l) £ B{(i)) and 
(1/"(M, f), (7", F") e re^"\ If more than one such [is found, 
choose one uniformly at random. If no such index is found, 
set / = 1. Let L be the index so found. The error events are 

£•4 A {(F«(Af,L),[/«(M),y") i ri")} , (54) 

where L corresponds to the I index found in the encoding step 
and 

Ez = {3l^L: F"(M, l)&B{(j)), (F"(M, I), t/", F") eT;^")} 

(55) 
is the second decoding error, where M is the estimate of 
Alice's message found in the first decoding step. Bob generates 



r 5 1 


5 


U^^ 


->y 


u=o J 


•t=0 



£,n(Uf,r 



Firstly, let £o,a = {([/", S*") i T},?^} and let £o,b = 
{A"(S'") > r}. Then P[£o,a] -^ by the Law of 
Large Numbers. So for Eq, it remains to show that 
P[£'o.b|iS'o a] ~^ 0. This follows since £o.b can be ex- 
pressed in terms of the empirical expectation as 



^0,1 



.sG5 J 



because the empirical distribution places a probability 
mass of 1/n on each sample. Furthermore, {U"^,S") £ 
7^,r which means that 5" G T^", which in turn implies 
by the Typical Average Lemma ||32l Chapter 2, Page 26] 

(s;S") 



that |E,(,^s„)[A(5)] - Ep(,)[A(5)]| < e"Ep(,)[A(5)]. So 
the empirical expectation satisfies E7r(s:S")[A(S')] < F 
because p{s) achieves CsKijrpr) and so Ep(s)[A(S')] < 
j^. In light of this, P[£:o,b|fo,a] = for all n. 
We can conclude that cost constraint in ([T]) is satisfied 
asymptotically because 

P[f0,b] < P[fo,b|foM + P[^0,a] , 

and both terms can be made arbitrarily small for suffi- 
ciently large blocklengths. 

Secondly, for each m, the number of w" sequences is 
= 2-'<nv;X\u)+S}^ gy jjjg Covering Lemma EH, for 
sufficiently small e (the typicality tolerance) relative to 

(5, P[£in£§] ->o. 

Thirdly, by the Conditional Typicahty Lemma (or Markov 
Lemma) ll32l Lemma 12.1] (and the choice e > e' > 
e" > 0), P[£:2 r]£§]-^0 and P[£:4 n f ^ n £1] -> 0. The 
second assertion requires a little more explanation. We 
can deduce from the Markov conditions dTSI l and (fT4l) 
that U-iX,S)-Y and V - [U, X) - {Y, S) and hence 
the joint distribution factors as 

p{v, u, X, s, y) = p{v\u, s, x)p{u, s, x)p{y\x, s). 

Thus, the codewords satisfy the Markov relationship 

y"(M,L) - (t/"(M),S'",A:") - F". Furthermore the 

{X, S) to Y relationship is a DMC. So by the Markov 

lemma |[32l Lemma 12.1], (y"(A'/, L), t/"(A/), F")) 

will be typical with high probability if e' is sufficiently 

smaller than e (cf. event £4 and ll32l ). 

To bound the error event £3, note that since \Ai\ = 

2n{i{U:Y)-2S}^ by the Packing Lemma ^, P[£3 D 

(U-=o^.)1^0. 

Finally, since ^^ = 2'''^iiV;Y\u)~S)^ ^^ ^^le Packing 

Lemma P[£5 n (Ui=o ^jY] ^ *-*. For this final step, note 
that we have conditioned on the event that Bob decoded 
m correctly so the "cloud center" u" is known. 
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6) Equivocation rate analysis: We condition on a random 
codebook 'rf throughout. In this case, we consider the quantity 

-I{Ka; Z", $|<^) = -H{Ka\'^) - -H{Ka\Z^, <&, ^) 
n n n 

< [liU, V- Y) - I{U, V; Z)] - iiJ(A'A|^", $, '^) (56) 



The inequaUty in (|56] l is due to the code construction, namely 
that |/C| = 2"(-f('^^^;^)-^(^'^;^)). It remains to show that 

H{KA\Z^,^,'^)>n[I{U,V-Y)-I{U,V-Z)]~ne,,. (57) 

where e„ — > as n — > oo. This will prove the existence of a 
codebook C* for which ^I{Ka; Z", ^I^T = C*) is ai-bitrarily 
small for n sufficiently large. For this purpose, write the 
equivocation i7(/"irA I Z",$,''^) as 

i7(A'A|Z",$/^) = ff(A'A,A/,L|Z",$,'r) 

-H{M,L\Ka,Z'\^,'^) . (58) 

Let us first focus on the first term in ( fSST l. We have 
H{Ka, M, LIZ"", $, ■^) = H{Ka, M, L, [/", F"|Z", $, '^) 
= H{Ka, *, M, L, U", F"|Z"/^) - iJ($|Z", ■^) 
>i/(J7",F"|Z"/^)-iJ($|Z",'^) , (59) 

where the first equality is because ([/",]/") is a function of 
{M,L). Recall that t/" = U"{M) and F" = F"(L,A/) 
denote the cloud center and the satellite codeword respectively. 
We now single-letterize H{U",V"\Z'\'^) in ^ using the 
following lemma whose proof is provided at the end of the 
section. 

Lemma 9 (Single Letterization of iJ([/", V^"|Z","^)). For 
every 7 > 0, the following holds for all sufficiently large n: 

H{U'',V''\Z'','^)>n[H{U,V\Z)-H{V\U,X)]-n-f. (60) 

Note that in dlOll, (U, V, X, Z) have the distribution that we 
fixed at the start of the achievability proof. 
Substituting ( l60l l into ( |59] |. we have 

7?(i<:A,Af,L|Z",$,'r) 

>n[H{U,V\Z)^H{V\U,X)] - i/($|Z"/#') - 717 

(a) 

> n[H(U\Z) + H{V\U, Z) - H{V\U, X)] - i?($) - n-f 

{b) 

> n[H{U\Z) + H(V\U,Z) - H(V\U,X) 

~IiV;X\U) + I{V;Y\U)-2S-^] 

> n[H{U\Z) - H{U\Y) + H{V\U, Z) - H{V\U, X) 

-I{V;X\U) + IiV;Y\U)-2S-^] 
= n[I{U; Y) - I(U; Z) + I{V; X\U) - I{V; Z\U) 

- I{V-X\U) + I{V;Y\U) -25 - ^] 
= n[I{U,V;Y)-I(U,V;Z)-25--i], (61) 

where (a) comes from conditioning reduces entropy and the 
chain rule, {h) comes from the fact that i?($) < log |<?| = 
I{V;X\U)-I{y; r|f7)+2<5 by the code construction, cf. gUl, 
and (c) because H{U\Y)>{). 

Now we bound the second term in ( fSSl l. We claim that 

(62) 



for some sequence ejj -^ 0. For this purpose, we show that 
there exists a decoding function [M , L) = g{KA, Z", $) such 
that P[(M, L) ^ {M, L)]^0 as n^ 00. Then, by applying 
Fano's inequality, we get (|62] |. 

Let g : Z" x K. x <P ^- M x C he sl joint typicality 
decoder More precisely, declare g{z", fcA, 0) — (jh, I) if there 
is a unique pair of sequences {u"{'rrL),v"{rh,l)) such that 
(u"(m), u"(to, I), z") e Te^"^ and f{m, I) = (fcA, </>), where / 
is defined in the code construction. Otherwise, set g{kA, z", (p) 
to be (1,1). Let ]\I and L be the message chosen by Alice 
and the (covering) index chosen by Alice in (ISTT l respectively. 
Define the two error events 

Ho = {((7"(A/),F"(A/,L),Z") i rj")} 

■Hi ^ {3 {m, I) ^ (M, L) : {U"{m),V"{rh, l),Z") e T}"^} . 

Eve's decoding error event is H ~ Hq U Hi- By the union 
bound, the decoding error probability is bounded as P[?^] < 
P['Ho] + P['Hi]. Now P[-Ho] ^ by the law of large numbers. 
Since there are |J|/(|<?||/C|) = 2"^'^^^^-^^-^^'> sequences in 
each sub-bin, by the Packing Lemma, P['Wi] -^ 0. Thus, 
P[H] — > as desired. This argument is similar to the 
argument for £3 defined in ( |53] | in the error analysis, with 
Z replacing Y. This proves ( |62] |. And hence, (|57] | is proved 
where e„ := ej^ + 2(5 + 7 can be made arbitrarily small. 

Proof of Lemma |9} We first prove ( l60l l. Using the chain 
rule, the multi-letter entropy can be written as 

H{U",V"\Z",'^)^H{U",V",Z"\'^)-H{Z"\'^) . (63) 

Clearly, the second term in (|63] | can be upper bounded as 



(64) 



H{Z''\'>f) <^H{Z,YS') = nH{Z) , 



4=1 



because H{ZiYS') is an average over all codebooks so it equals 
H{Z). Note that the distribution of Z in ^^ is 

P{z) = ^p{u)p{s\u)p{z\s) , 



and where p{u) and p{s\u) are the distributions we fixed at 
the start of the proof. So it remains to upper bound the first 
term in ( l63T l. We introduce an additional X" for this purpose. 
Consider, 

7I(C/",T/",Z"|<r) 

= H{U", V", X", Z"|<r) - H{X"\U", T/", Z", <r) 
= H{Z"\U", V",X", <r) + H{U", V", X"\'^) 

-iJ(X"|t/",F",Z",'^) . (65) 



i?(Af, i|AA,Z",$,^) <ne; 



The first term in ( l65l l can be single-letterized by noting that 

i7(Z"|[/",l/",X",'^) = iJ(Z"|C/",X",<r) (66) 

because from the covering step in (ISTl l. V^ depends only on 
(f7", X") (and implicitly on the codebook '^^) so 

= p(a;",w",C)p(w"|x",u",C)p(z"|x",u",C) . 
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Hence, V" - (t/", X", <^) - Z" forms a Markov chain in that 
order, explaining the dropping of the y in ( |66b . Now we 
write ( |66b as a difference of two terms for clarity: 

i/(Z"|C/",X",'r) = i?(Z",X"|C/"/^) - iJ(X"|C/",^) 

(67) 

We claim that F(Z",X"|[/",^) = nH{Z,X\U) and 
if(X"|C/",'r) = nff(X|C/). This is because by the random 
construction of the codebook, C/" is generated at random 
according to YVi=iPi''^i)' (-^",-^") (the two outputs of the 
broadcast channel) depend only on U"' (through an interme- 
diate 5"") and [/" to (X",Z") is a DMC since [/" to S"" is 
a DMC and 5*" to (X", Z") is a DMC. As a result, 

iJ(Z"|C/",X^",X",'^^) = ni?(X,Z|t/) - ni7(X|l7) 

= nH(Z|X,L/) = nH(Z|X,t/,F) (68) 

where the last equality holds because V — {U,X) — Z forms 
a Markov chain from ( fT4b . 

We bound the second term in (l65ll as follows: 



H{U'', y", X"]-^) > H(t/", X"!"^) = nH(t/, X) . (69) 

The inequality will be tight for a deterministic satellite encod- 
ing function for which iJ(l/"|[/",X"/^) ~ 0. The equahty 
follows from the same reasoning that led to (|68] i: namely that 
the expectation involved in the entropy is also taken with 
respect to the choice of the (random) codebook, so [/" is 
generated in an i.i.d. manner and X" is related to [/" by a 
DMC. 

Now we upper bound the third term in (|65] |. We define the 
random random variable Ei := l{f i}, i.e., Ei takes the value 
1 if £i occurs and otherwise, cf. ( fSTl i for the definition of 
£i. We have 

H{X"\U'',V",Z",'^)<H{X''\Ei\U",V",Z",'r^) 

<H{X"\Ei,U",V'\Z",',^) + l (70) 

where JTOl l holds by the chain rule for entropy and because 
El can only take on two values. Now, we have the following 
chain of inequalities: 

H{X"\U",V",Z",'^) 
<H{X''\Ei ==0,[/",F",Z",^) 

+ P{£i)H{X"\Ei = l,t/",F",Z",<^) + l 
<H{X''\Ei=0,U",V",Z",'^)+nTn . (71) 

In dTTT i, we defined r„ = P(£i) log lA"! + - as sequence that 
tends to zero as n tends to infinity because P(£i) -^ (by 
the choice of rates and the covering lemma). Conditioned on 
{El = 0} (equivalently £f occurs), ([/",!/", X") are e"- 
jointly typical. In addition, Z" depends only on (t/",X") 
(and implicitly on the random codebook '^^) in a discrete 
memoryless fashion. See the reasoning after ( |67] |. Also note 
that V - {U,X) - Z. Hence, by the Markov lemma 1321 
Lemma 12.1], the quadruple ([/", V^",X", Z") is //-jointly 
typical with high probability albeit with a different (and larger) 
typicality tolerance r/. We state this more precisely as follows: 
Define the indicator random variable 



F^ 



1 {V",U",X",Z") (^Trl 



(") 



(") 



(72) 



By the Markov lemma (the conditional typicaUty lemma, in 
fact, suffices), P(F = 0) — ?> 1 if e" is sufficiently smaller than 
rj. Let ^„ = P{F = 1) log \X\ + ^ and note that ^„ ^ as 
n -^ oo. Now by using the same method that led to ( ItTI ). we 
have 

iJ(x"|£;i = o,c/",i/",z",'^) 

<iJ(X"|Si =0,i^ = 0,C/",F",Z",'^)+<„ . (73) 

Now, {F = 0} means that the quadruple (F", t/",X", Z") 
is 7/-typical. When this occurs, X" also belongs to the ?/- 
conditional typical set T,\''\x, U, V, Z\u'\ u", z"). (Refer to 
the discussion following ( |48] ) for the definition of the condi- 
tional typical set.) Furthermore, the /^-conditional typical set 
has cardinality upper bounded as 

|rJ")(X,t/,T/,Z|u",z;",2")| <2"[^(^l^'^'^)+'^('')l . (74) 

See Ell Chapter 2]. Since H{B) < \og\B\ for any discrete 
random variable B, we have 

H{X'"-\Ei =0,F = 0, t/", y", Z", '^) 

<n[H{X\U,V,Z) + S{r^)]. (75) 

Uniting (EB, GUl, ^ yields 

iJ(X"|C/",F",Z"/0 < n[H{X\U,V,Z) + j] , (76) 

where 7 = (5(r/) + r„ + ^„. Note that 7 is a sequence that 
becomes arbitrarily small with increasing blocklength n; cf. 
the Delta-convention discussion after ( |48] l. 

Inserting the bounds in ( |68] |. ( |69] l and (|76] | into ( |65] |. we 
have 

> n[i7(Z|[/, F, X) + ii"(L/, X) - iJ(X|L/, F, Z)] - n7 
= n[i7(Z|J7,F,X) + i?(t/,F,X) - iJ(T^|[/,X) 

-i7(X|C/,\/,Z)]-n7 
= n[H{U,V,Z)-H{V\U,X)]~n'y . {11) 

Combining ( |64l i with dTTj i yields (|60] l, completing the proof 
of Lemma |9] D 

7} Completion of Achievability Proof by Introducing W: To 
complete the proof, let p{w) be the optimizing W distribution 
in ([8]l. Let 7r„(u') G 'Pn(VV) be a sequence of n-types such that 
TTn{w) —>■ p{w) for every w £ W. Fix w" as some sequence in 
the type class of 7r„(u'), i.e., the type of w" is equal to 7r„(z«). 
The sequence it;" is appended to the codebook and thus known 
to all parties. Then we follow the previous proof by replacing 
the marginal distributions with the conditional distributions in 
the codebook generation step (as in (|3] Lemma A]). More 
precisely, we use Y["=i P{ui, Si\wi) in place of HLi P("i> ^i) 
and J|"^^p(wi|ui, w,;) inplace of n"=iP(^i|Wz)- This achieves 
the rate /(V, V;Y\W)- I{U, V; Z\W) as desired. 

Hence, the rate Csk{ il /, ) is achievable. The proof of the 
achievability of (|8]l completed by taking e" — > and appealing 
to the continuity of Csk (Lemma |2]i. D 
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C. Proof of the Cardinality Bounds 

We provide only a sketch of the proof of the cardinaUty 
bounds on the auxiliary random variables W, U and V since 
the technique used has by now become standard. We use 
the Fenchel-Eggleston-Caratheodory Theorem described in 
Appendix C of |[32| . In particular, for W and U, we basically 
follow along the same lines as the cardinality bounds in ||6l 
Theorem 1]. 

The alphabet W should have |5| — 1 elements to preserve 
p{s), six additional elements to preserve H{Y\W), H{Z\W), 
H{Y\W, U), H{Z\W, U), H(Y\W, U, V) and hIz\W, U, V) 
[which preserve the difference of mutual information quanti- 
ties in ( fT9] l and ( l20b l and two additional elements to preserve 
the Markov chains (O and (O. This gives |W| < |5| + 7, 
which is ( fTOl ). Note that the preservation of p{s) automatically 
implies the preservation of the expected cost E[A(5')]. 

For every w G W, the alphabet U^i should have \S\ — 1 ele- 
ments to preserve p{s\w), four additional elements to preserve 
H{Y\W = w, [/), H{Z\W = w, [/), H{Y\W = w, U, V) and 
H{Z\W = w, U, V) and two additional elements to preserve 
the two Markov chains. This gives \U\ < {\S\ + 5){\S\ + 7), 
which is (fTTT i. 

The size of the alphabet V can be bounded similarly 
by referring to the Markov chain in ( fT4l i and noticing that 
IWIIZ^IIA-I < \X\i\S\ + 5)(|5| + 7)2. Essentially, we need to 
preserve the distribution of {W, U, X), two entropy quantities 
H(Y\W = w,U = u,V) and H{Z\W = w,U ^ u, V) and 
one Markov condition (fT4] i. This gives (fT2] i. D 

D. Proof of Lemma |2] 

That CsK(r) is non-decreasing is evident from its defini- 
tion. We now show that CsK(r) is concave. Fix two length-ri 
codes ^^i and "^^2 that achieve CsK(ri) and CsK(r2) respec- 
tively. Consider the length-2ri code "^ that is the concatenation 
of ^^i and ■^2- That is, we use the channel 2n times in which 
for the first n channel uses, we use '-^i and for the remaining 
n, we use '^^2- Then the total cost of "^^ is 

2n n 2n 

^A(s,) = ^A(.s,)+ Y. A(s0<n(ri+r2) , (78) 



i=l 



'i— n+1 



since the first and second codes have costs smaller than Fi and 
r2 respectively. Hence, 'to satisfies ^ X]i"i A(si) < ^(Fi + 
F2). We have constructed a codebook with rate ^(CgK(Fi) + 
CsK(r2)) and with cost < i(Fi +r2). Thus, 

CsK Q(ri + Fa)") > i(CsK(ri) + CsK(r2)) , (79) 

i.e., CsK(r) is mid-point concave. Since CsK(r) is non- 
decreasing, its level sets are intervals and so it is Lebesgue 
measurable (Sierpinski's theorem |42, pp. 12]). Combining 
this with the fact that it is mid-point concave, we conclude 
that CsK(r) is concave. Since a concave function on an open 
set is also continuous, CsK(r) is continuous on (0,oo). 

Note that the above proof is an operational one and does 
not depend on the functional form of CsK(r) in (O. D 



E. Proof of Proposition |4] 

We prove the upper bound in ( l23T l. Consider the inequalities: 

n-RsK < I{Kj^-Y'\<^) + ner, 

</(i^A;r",$,Z")+ne„ 

(b) 



</(XA;l^"|$,^") + 2ne„ 
</(i^A,$;>'"|^") + 2ne„ 



(80) 



where (a) follows Fano's inequality, (&) is due to the secrecy 
condition (|3]l. Continuing from (ISOl l. we have 

ni?SK < /(A:",M;y"|Z") + 2ne„ 

= /(X"; y"|Z") + /(M;y"|X", Z") + 2ntn 

< /(X"; y"|Z") + /(S*"; r"|A:", Z") + 2ne„ 

= /(5";y"|Z")+/(X";y"|S'",Z") + 27ie„, (81) 

where (c) follows because {Ka, $) is a function of (AT", M) 
and (d) follows because the channel only depends on 5" so 
Af - 5" - (X",y",Z")Q Now the first term dSB can be 
upper bounded as follows 



7(5"; y"|Z") = i7(y"iZ") - H{Y"\S", Z") 

n 

= Y, H{Y,\Y'-'^, Z") - H{Y,\Y"\S'\ Z") 



i=l 



< J2 HiY,\Z,) - HiY,\S,, Z,) = Y ^(^*' ^-l^'O ' (82) 

i=l i=l 

where the inequality follows by conditioning reduces entropy 
and the Markov chain {Y'-\Z^'\\ S"\'') - {S^,Z,) - F,. The 
second term in (ISTT i can be written as a sum: 

n 

/(X";y"|5",Z") ==^/(X,;y,|5„Z0 (83) 

i=l 

because the channel p{x,y,z\s) is memoryless. Substitut- 
ing §^ and ^ into (gB yields 

n 

nRsK < Y HS^■:Y,\Z,) + I{X,; Y,\S,,Z,) + 2ne„ 
1=1 

n 

= ^/(X„5,;K,|Z,) + 2ne„ . (84) 

?=i 

The proof can be completed along the lines of the converse 
proof of Theorem [T] See Section IVII-AI and in particular the 
steps leading to (l47T i for details. D 

VIIL Proofs of Results in SectionITv] 

In this section, we provide the proof of Theorem |6] on 
the capacity-reliability-secrecy region. This section will be 
split into three subsections: In the first subsection, we collect 
some relevant definitions and describe the coding scheme. 
The second and third subsections contain the proofs of the 

''in fact, {d) holds with equahty because S" = S^{M) in addition to the 
stated Markov relationship. 
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achievability (lower bounds) of the reliability and secrecy 
exponents respectively. This proves the achievability of the 
region 'R.{p{s) , R.j, , Rm) defined in ( [30] l. 

A. Definitions and Coding Scheme 

We start with some definitions to describe the generation 
of the codewords s"{m), the key and the public message 
generation procedures. 

Definition 7 (Random code). A (2"^*' , n) random code 
generated according to p{s) is a random subset of S" which 
contains length-n sequences s^{ra)^ra G [1 : 2"^^^] where 
each sequence s" (m), called a codeword, is drawn according 
to the pmfll"^^p{si). 

Note that we do not place any cost constraints on p{s) 
because we assume that F = oo in Section |IV] 

Definition 8 (Random binning function tl2]). A 2"^ random 
binning function for an alphabet U is a random map^ ^ : u £ 
Z^ — > 6 G [1 : 2"^] that satisfies the following properties: 

• Each element u € U is independently and uniformly 
assigned to an element of [1 : 2"^]. 

• Each pair of different u,u' ^lA is mapped u n- fo, m' i— > 
b' with probability 2~^"^ for each pair of elements b, b' G 
[1 : 2"^] (not necessarily different). 

• The random map Tp is independent of the random code 
generation process as per Definition More precisely, 

P({5" = s"} n {ij{u)=b})^ P(5" = s")P(V'(u) = b) 

The first and second properties in the definition above are 
known as uniformity and pairwise independence respectively. 

In the following, we will define the key generation and 
public message procedures using random binning functions. 
Note that this is in contrast to the typicality-based proof of 
Theorem [1] where we used a deterministic binning procedure 
(cf. Fig. [Tol l. In the proof of the capacity theorem, we dealt 
with a set of randomly generated sequences (based on auxiliary 
random variables) instead of a set of deterministic sequences, 
and therefore, there is no need to also randomize the binning. 
See 1321 Chapter 12] for further discussions. 

We now introduce the notion of a random binning code for 
the secret key generation protocol (See Section III-Ab . 

Definition 9 (Random binning secret key code). A 
^2^iRsK ^2"-^''' ,2"-^'^ ,n) random binning secret key code is a 
(2^Rm ^ 2"^'i' ^ 7i) code for the secret key generation protocol 
in which the public message and key are generated via two 
independent random binning functions: 



-ynR^ 



</) : 7W X A-" ^ <?> = [1 : 2" 
kA ■■ M X X" ^ K. = [I : 2"-^^"^] 



(85) 
(86) 



More precisely, note from ( [85] l that (f> is a 2"^"^ random 
binning function for alphabet Ai x X" and from ( [86] l that 
^A is a 2"^^'^ random binning function for alphabet Ai x A"". 



Codebook Generation and Encoding: Fix p{s). We use a 

(2"^sK 2"^J'^,2"^*,r7,) random binning secret key code in 
which the codewords s^{m),m G Ai belong to a (2"^^^,ri) 
random code generated according to p{s). The codewords 
and bin assignments are revealed to all parties before com- 
munication starts. We emphasize that by construction, this 
(2"^sK 2"-"i'^,2"-«*,n) code is a (2"^", 2"^-,ri) code (in 
the sense of Section lTl-Al with F = oo) such that secret key rate 
RsK is achievable. This is because A'a is uniformly distributed 
on [1 : 2"^^K] so g], jg satisfied. 

By the definition of TZ{p{s), _R$, Rm) in dSOl l, it suffices to 
show the following two assertions hold true for any p{s): 

liminf -i log P(Aa ^ A'b) > Eo{p{s), R^, Rm), 

n— J-oo 11 

liminf--log/(AA;^",$) > Fo{p{s),Rsk,R^,Rm). 

■n— 7-00 71 

This is what we prove in the next two subsections. 



B. Proof for the Reliability Exponent 

In this section, we will prove that Eq is an achievable 
reliability exponent. Recall that Bob has access to his channel 
output y" G y^ and the public message G <?, which 
was generated by Alice in accordance to the random binning 
function in (|86] |. In order to analyze the error event that Bob's 
key does not match Alice's 



fkcy = {i^A ^ A'b} , 



(87) 



we stipulate that Bob decodes both Alice's received sequence 
x" G X" and Alice's source of randomness m ^ M.. 

We restate the ML-MAP decoding rule in (|32] |: Given 
(y", (/)), Bob declares that m is the message selected by Alice 
and .t" is the sequence sent to Alice if the public message bin 
index of (m,,x'"') agrees with (f>, i.e.. 



(m, x") = (j) 



(88) 



and the probabilities satisfy 



p(2/'>"(m))p(x"|y",s"(™))> 

p(2/"|s"(7n)M£"|y",s"(m)) (89) 

for all other pairs {rh,x'^) such that 0(m,i;") = </>. As 
mentioned previously, this is a hybrid of an ML and an MAP 
rule. Observe that if we were just to maximize p{y^\s"{'m)) 
over 771, this would correspond to a pure ML decoding rule 
for the channel p{y\s) as in ifTTl Sec. 5.6]. If instead we 
maximize p{x'^\y^,s"{77i)) over x" given m is known, this 
would correspond to a pure MAP decoder for the source x" 
given side information (tti,?/") as in lfT2l . 

By analyzing the ML-MAP decoder, we now upper bound 
the probability of event £kcy of the ensemble random bin- 
ning secret key code ^, i.e., P(£'koy) — E<i^[P(£'koy|'^)] ~ 
^^p(C)P(£kcy|'^ = C). Throughout, we use the notation '^ 
to denote the random code (a random variable) and C to denote 
a specific code. Define the error event that Bob decodes either 
M or X" incorrectly 



'More precisely, tp(b\u) is a matrix of conditional probabilities. 



£^{(M,X") 7^(i\f,X")} 



(90) 
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Clearly, £kcy C £. Thus, an upper bound for P{£) also serves 
as an upper bound for P(£kcy)- Similarly, a lower bound for 
the exponent of P(£) is also a lower bound for the exponent of 
P(iSkoy)- In the interest of tractability, we upper bound P{£) 
[instead of P(£kcy)] when the ML-MAP decoder described 
in ( l88b and ( [89] l is used. 

Consider the probability of error given that m is the message 
sent, s" (to) represents the ensemble of codewords associated 
to 771 (by the random codebook construction in Definition |2l), 
a;" is Alice's received sequence and ?/" is Bob's received 
sequence. That is, consider 



P{£\y-,s^{m),m,x-) 



y yi(s"(7n),TO,x") 

k m7^T7i , s" (m) , i '^ T^a:" 



(91) 



codewords s^{m), all observed sequences y" and all possible 
sequences received by Alice a;", i.e., 

P(f |to) = Y.T. p(y"N"(™))p(s"(TO)) X . . . 

y" s"(Tn) 

x^p(a;"|y",s"(TO))P(£-|y",s"(7/7),TO,x") . (94) 

We now substitute the upper bound in ( |93] ) into ( |94b . Pulling 
out p(a;"|7/", s"(777)) from the innermost term in ( |93] ) (since it 
does not depend on tti, s^ifh) and x"), we see that P(£|777) 
can be upper bounded as 

P{£\m) < m-'^Y. J2 p(y'>"MMs"M) X ... 



In the above error probability, ^(s"(7n), 7fi,i;") is de- 
fined as the error event that the message rh ^ m, 
codeword s^{rh) and Alice's sequence £" 7^ a;" are 
selected in such a way that their ML-MAP objec- 
tive value is higher than that of the true parameters 
(to, s"(TO),a;"), i.e., that p(j/"|s"(7fi))p(x"|2/", s"-(7ti)) > 
p(7/"'|s"(TO))p(a;"|7/", s"(to)) and also that (f){m,x'^) = 
(j){m,x'^). Note in ( |9T1 i that the error event is averaged over 
all incorrect codewords s"(7fi) due to the random codebook 
construction (Definition|2ll. Now recall the assumption that the 
binning process is pairwise independent and also independent 
of the inputs (Definition H). More precisely, 

PdS*" =s"(TO)}n{</.(TO,a;") =0(to,x")}) 
= P(S'" = s"(rn))P(0(TO,x") = 0(777, £")) 



^p(x"|y",s"(TO))i- 



■pt 



X r ^(^"l^"(^)) ] ;^ p(i"|,",."(7^)) 



\p(y'^\s^{m)) 



^^ ^ ^(."(777)) 



:c'i:^a:" 



= |<?r''(|A<| - l)''^Vl/i(y»,p,t)vI/2(y»,p,t) , (95) 

J/" 

where the functions ^1(7/", p,i) and ^2(2/", P,0 ^re defined 
as follows: 

*i(7;",p,t)^ ^ p(,s"(m))p(7;"|,s"(TO))i-''*x... 
x^p(a;"|y",s"(TO))i-''* 



. „..NN Y^ 1 p{s'^{rh)) 

0G* ' ' ' ' 



(92) 



*2(2/",P,t) = 



Let Ig be the indicator variable of the set B. By using the 
definition of A{s"{rh), rh, i") and (|92] i, we can upper bound 
the probability of ^(s"(777), 7?7, x") as follows: 

PiAis"{m),m,x")) 

p(.s"(777,)) 



J2 P(s"(777))p(y"|s"(m))* 

s"- (m) 

x^p(£"|y",s"(777))* 



I'Z'I 



L{p(£'»,y'»|s"(m))>p(x",y"|s"(m))} 



< 



p(.s"(777)) /p(7/"|s"(777))p(a;"|7;",s"(777)) 

1^1 Vp(7/"|s"(TO))p(a;"|7/",s"(TO)) 



for all i > 0, where the inequality follows because l{a>b} < 
(|)* for all f > 0. Let p g [0, 1]. By applying the inequaUty 
P (u?LiA) < ELi P (A)]" im pp. 136] to dSB, we have 

P(f|7/",s"(TO),TO,a;") 

y^ p(s"(777)) ^^ 



< 



m7^T7i,s" (m),:r":^a;" 



I'Z'I 



Equation ( |95T l follows because 777 in the line above is a dummy 
variable that can take on exactly \A4\ — 1 values and for each 
777., we generate codewords s"'{rh) in the same way in the 
random coding construction. Now notice that if we set t ~ 
1/(1 + p), then 

*2(2/",P, 1/(1 + p)) = *i(2;",P, 1/(1 + P))'' 

because x" and 7?7 in the definition of ^2 ^e dummy variables. 
As such, P(£'|777) can be bounded as 

P{£\m) < \<P\-P\M\pY.'i'3{y",p) , (96) 



where the function ^3(7/", p) is defined as 



p(7/"|s"(777))p(x"|?/", S"(777)) 
p(7/"|s"(?77,))p(a;"|?/", S"(777)) 



(93) 



*3(J/",P) 



Y, p(s"(to))p(7/"|5"(to))1/(1 



+p) 



_ s^{rn) 



for any p G [0,1] and t > 0. Now consider the error probability 
P{£\M ~ rn) given message to is chosen by Alice, i.e., {M = 
777} occurs. To bound this error probability, we average over all 



5]p(x"|2/",s"(to))1/(i 



+p) 



i+P 



SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMATION THEORY, MAY 2012 



20 



Now, we recall the DMS and DMBC assumptions, i.e., that 

n 

p{s"{m)) = l[p{s.{m)) , 

n 

p(a;",j/"|s"(m)) = J|p(xi,yi|s,(m)) . 
As a result, '^■s{y",p) simplifies to 



The proof is provided at the end for completeness. Now we 
consider the inner sum in ( |98] |. By introducing the input s" 
and by repeated applications of Bayes rule. 



^p(m,a;"|z") 



l+Q 



*3(2/",p) 



n E P(s,(m)My.|s,(m))i/(i 



+p} 



i=l Si(m) 



EE 



X" m 



EE 



EpKa;",s"|z" 



l+a 



1+a 



E 



p(z") 



nl+P 



EMa;.|y.,s.(m))i/(i+p) 



^(i^EE^iK -"'-")'■'" 



(99) 



a;" m 



and the sum in ( |96l ) can be written as a product of single- 
letterized terms: 



E*3(y",p) = nE*4(y.,p), 

where the function *I'4(?/,p) is defined as 



*4(2/,p) = 



Ep(s>(2/|s)'/^'+''^Ep(^|2^'^)'^^' 



+p) 



(97) 



i+p 



= , M^\..n^ y y 02(m, x", z")i+" (100) 

-I \ / II x^ m 

where the functions 8i(m, x", 2;") and 02(to,x",z") are 
defined as 

ei(m,a;",z"):^Ep(™M'5"l"^)p(^"|5")p(2;"|5"'^") 

e2(m,x",z") = Ep(*"I"^)p(^'>")p(^"I«"'^") ■ 



Because each of the codewords is generated identically, each 
of the terms in the product in Wt\ is also identical. Hence, 



yn 



E*4(y,p) 



L y 



Recall that |^| = 2"-^* and |A^| = 2"-^*^ In addition, note 
that P(£) = X;„,p(TO')P(f |m') = P(£|m) for every m e 
A^. As such, taking the normalized logarithm and limit inferior 
of ( |96] l yields 

liminf log P{£) > p{R^ - Rm) - logV ^4(2/, p) ■ 

n-^OQ Ji ^ — ^ 

J/ 

Essentially, what we have done is to develop a "hybrid" of 
Gallager-style error exponents for channel and lossless source 
coding with side information. Thus, an achievable error expo- 
nent when input distribution p{s) is used is Eo{p{s), i?$, Rm) 
defined in ( |27] |. The reliability exponent part of the theorem 
is proved for the random binning secret key code. D 

C. Proof for the Secrecy Exponent 

We now prove that the secrecy exponent is at least Fq using 
the same coding scheme. We can use steps analogous to the 
proof of the direct part of Theorem 2 in 12411 to obtain the 
following bound on the key leakage I{Ka; Z", $). 

Lemma 10. Define c{a) ^ a~-^ log e for < a < 1. The key 
leakage can be bounded as follows: 

I{Ka; Z", $) = E<^[I{Ka; Z", $|<r)] 

<c(a)i/cri<i'i"Ep(-^") E^^™'"^"!-^")^^"' ^^^^ 

for all < a < 1. 



Equation ^ follows because M - S*" - (X",Z") 
form a Markov chain so p(z"|s", m) = p(z"|s") and 
p{x"'\s",z"',m) =p(a;"|s",2:"). Equation ( 1 100b follows from 
the uniformity of the messages m in the message set A4, i.e., 
that p{m) = TjjT for all m G M. We now upper bound 
02(m,a;", z")^"*"". This is done using the following lemma. 

Lemma 11. Let {(Aj,aj)} be a finite collection of non- 
negative numbers such that ^ \j = 1. Also, let r > 1. Then, 
the following inequality holds 



(E^j°j] ^E^j"j 



This can be proven by noticing that t ^^ V is convex. We 
omit the details. We now make the following identifications; 

a,,n = p(z"|s")p(a;"|s",z"). A,- = p(s"|?7i) and r = 1 + a 
and apply Lemma [TT] to 82(771, x", z")^"*"". This yields the 
inequality 

e2(m,x",z")i+" < Ep('S"l™)[p(^"|s"Ma;"|s",^")]^+". 

(101) 

On account of (l98]l, (1 1001 ) and (IIOII ), we have 

£^[I{Ka; 2", <^\^)] < c{a) |/Cn<?nA^|-(i+") X . . . 
Ep(^")-" E P(s"|m)b(z"|s")p(x"|s",z")]^+" 



E Ep(^"'^"'^"I'™) 



s",a;",2:"' rn 



p(z"|s") 
p(z") 



p(x"|s",z") 



where the final equality follows because p(s", x", z"|m) = 
p(s"|m)p(z"|s")p(a;"|s", z") by the Markov chain A/- S*"- 
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{X^,Z"). Now, pulling the p{'m,) = tjjt term into the sum, 
we get 



s^ ,x^ ,z'^ m L V / 

= c(a)|/Cr|<Z'r|A^r" ^ T(s",x",z",a), 



p(a;"|s",z") 



where the function T(s",x", z",a) is defined as 

>(z"|s") 



T/n 71 n \A ^ n n n\ 
[s ,x ,z ,a) = p[s ,x ,z ) 



p(z") 



P(^'>",z") 



Now, recall that (i) the input S*" is a DMS when averaged 
over all codebooks and all messages m E Ai (because 
the generation of the codewords s^'^{m),m G TM is done 
identically) and (ii) p{x,y,z\s) is a DMBC. Then, we have 
the upper bound 



E^[/(Aa;Z",<1>|' 

n 



i—1 Si,Xi,Zi 



c{a)\IC\'^\<P\"\M\~" 



E^i 



o , tti , /C , Cx J 



(102) 



Note that the bound ( 1102b holds for all < a < 1. Recall 
also that /C = [1 : 2"-"sk]^ <^ = [1 : 2"-"*] and 7W = [1 : 
2"-^^^] so \IC\"\<P\°'\M\-°' = 2""(^sk+-R*--R.m), Now take 
the normalized logarithm and limit inferior of (|102| i to get 

liminf - - log E^[/(Xa; ^", ^l"^)] > 

n— >oo 77, 

- Q;(i?sK + i?* - Rm) - log ^ T(s, X, z, a) . 

s,x,z 

The joint distribution of {X, Z, S), namely p{x, z, s) = 
p{x,z\s)p{s), is induced by a particular input distribution 
p{s). Essentially what we have done in this part of the 
proof is to develop a "hybrid" of the information leakage 
exponent for the wiretap channel model 11131 Eq. (14)] and the 
excited source model 1241 Theorem 3]. Hence, an achievable 
exponent for the key leakage given input distribution p{s) 
is Fo{p{s),Rs}^,R^,Rm) defined in ( |29] l. The secrecy 
exponent part of the theorem is proved for the random 
binning secret key code. 

From Random Codes to a Deterministic Code: Combining the 
proof in Section IVlll-BI and proof in this section, we have 
shown that for the (2"^sk 2"-R", 2"-"*, n) random binning 
secret key code, the expected probability of error decays 
with exponent (at least) Eo (expectation over codebooks and 
random binning functions) and the expected key leakage 
decays exponentially with exponent (at least) Fo- Since both 
are measured with respect the same (known) channel, there 
exists a binning secret key code that meets the ensemble 
behavior More precisely, observe that P{£) = E<^[P(£|'^)] = 
^(^p(C)P(£'|'^#' = C), where C runs through all binning secret 



key codes (a random code and two random binning functions) 
and the event £ is defined in ( l90l l. By Markov's inequality. 



P<^[P(f|^)>3P(f)]<i 



(103) 



Similarly, when averaged over all codes, the average key leak- 

age is E^[/(i^A; ^", m)] = HcPiCViKA-, Z^, ^^ = C), 
so by Markov's inequality, 

P<^[I{Ka;Z",<^\'^) > 3E^[/(/^A;^",$rO]] < ^ • (104) 



From ( 11031 ), by considering the complement of the event 
of interest, we can conclude that there exists a subset 
of binning secret key codes Pi with total probability 
mass that exceeds 2/3 (i.e., J^cev P(^) — ^/^) ^'^'-^ '■^^'■ 
P(f l-if = C) < 3P(£) for every C e Pi. Similarly, from ( fT04l ) 
there exists a subset of binning secret key codes X>2 with total 
probability mass that exceeds 2/3 (i.e., J^cev Pi^) — 2/3) 
such that /(/sTa;^",*!'^ = C) < 3E<^[IiKA;Z'',^'^)] 
for every C e I?2- Note that P(Pi n V2) > 1/3 so 
Pi n I?2 y^ 0. Thus, there exists at least one binning secret 
key code C* in the ensemble of (good) codes Pi n P2 
such that P(£'koy|'^ = C*) < P(£:|'r = C*) < 2^"^° and 
I{KA;Z'',<^\'if = C*) < 2-"-^°, where the event fkcy is 
defined in ^. D 

Proof of Lemma \TU^ Recall the assumption that the key and 
public message binning processes are random, uniform and 
independent of the random codewords (See Section IVlll-AI 
for definitions and the code construction). The key leakage 
can be expressed as follows: 

E^[/(Xa; z", $1-^)] = E<^[H{Ka\'^^) - h{Ka\z^, $1-^)] 
= Ev[H{Ka\'^) + H{^\Z'\ ^) - H{Ka, ^Z'\^)] 
< log \JC\ + log I'I'I - E<^[H{Ka, $|Z", '^)] . (105) 

The conditioning is on the specific codebook used, i.e., ^ = C 
It remains to lower bound the conditional entropy in ( 11051 ). For 
this purpose, let 



iJi+„(X)^-ilogVp(x)i+" 

rv ^ — ^ 



(106) 



xex 



be the Renyi entropy of order 1 + a for < a < 1. Note 
that limax^o ^i+a(-'^) = H{X). Also, by the concavity of 
t ^ logt, it can be verified that H{X) > Hi+a{X) for all 
< a < 1. Consider the conditional entropy in ( 1105b . 

E<^[H{Ka,<^\Z",'^)] 



5^p(z")iI(i^A,$|^" = z"/^) 



> ^p(z")E^[i7i+„(i^A, $1^" = ^",<^)] 



(107) 



>^p(z")(-ilogE^ 



y^ p{kA,0\z 



n c^\l+a 



(108) 

The last inequality is due to the definition of Renyi entropy 
in ( |106b and the application of Jensen's inequaUty noting that 
the function x h^ — log x is convex. 
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Now let (M, X") be a pair of random variables identically 
distributed to, but conditionally independent of (M, X") given 
the events {Z" = z"} and {<r = C}. Recall that fc( • , • ) 
and 0( • , • ) are the key and public message random binning 
functions respectively. See (|85] | and ( |86] | for definitions. Define 
(Ka,^) - (fc(A7,X"),,^(M,X")). Then, 

(109) 



by interpreting the Renyi entropy in (1106b in terms of an in- 
dependent [from (Ka,^)] and identically distributed random 
variable {Ka, ^)- 

Define a shorthand notation for the indicator function as 

l[kA,(t)\m,x",C] = l[fcc(m,a;") = fcA, -^cCm, a;") = (/)]. 

(110) 

where fee ( • ) and 0c ( • ) are the binning functions associated to 
a specific codebook '€ = C. We upper bound the expectation 
in the logarithm in dlQSt on the top of the next page. 

The step dlllb is a result of plugging dllOb into the 
argument of the logarithm in (|lQ8b . The step (1112b follows 
by writing out the probability of a collision event in ( |109b 
explicitly as a sum. The step in ( |113b applies the law of total 
probability. We sum over all possible (m, x") that are assigned 
bin indices (/ca, 0) for a given pair of binning function indexed 
by ^. Equation ( 1114b follows by simple reordering of the 
sums. 

The step ( 1115b is an application of Jensen's Inequality to the 
term in brackets [ • ]" since the sum over (fcA, 0) is a sum over 
the probability mass function l[fcA, 0|?^^,a;",C] (cf. ( IllOb for 
the definition of this indicator function). Also, the function 
X I— >■ a:" is concave for ol £ [0,1]. We recall that m, x", 
and C are all fixed for this inner sum, the last being fixed 
by the outer expectation over '^. Equation (11 16b follows from 
the same reasoning as (11 13b . i.e., the law of total probability. 
Equation ( |117b follows by simple reordering of the sums. 

In dllSb . we used the "sifting" property of the indica- 
tor function l[fcA = A;^,'/' = </>']■ In (11 19b we split the 
sum over {m.',x'") into two terms and distributed the sums 
over {k'j^, (/)'). Note that for the (to',x'") = (m,a;") term, 
J2k A 1[^A, 4i\m,x" ^^] = 1. We next applied the inequality 
{x +'j/)" < x" + y", for < a < 1 to get ( fT20l i. 

In dl21b we note that the first term is not a function of 
C. Using the concavity of x H^ cc" (for a £ [0, 1]), we move 
both the sum over (m, x") and the expectation over codebooks 
inside the function, a step justified by Jensen's Inequality. 

In dl22b we apply the uniformly random design of 
the binning functions. Since {m,x'^) ^ (m',x'") for 
every term in the sum, each of the indicator functions 
equals the (fixed) pair {kA,4>) with equal probability 
and independently. Thus, the probability that both equal 
{kA,(f>) is the square (by the independence) of the recip- 
rocal of the number of possibilities (by the uniformity), 
i.e., E.^[l[kA,(t^\m,x",^]l[kA,(t^\m',x'",^]] = (|/C||<?|)"2. 
In ( fT23T l. we pulled out (|/C||<?|)~". Finally, noting that 



p{m, x"|z")p(m', a;'"|z") is a well defined (conditional) prob- 
ability mass function and that we are missing one term in 
the double sum. Hence, we get (1124b by upper bounding the 
double sum by one. 

Substituting dl24b back into (|108b gives 



E<^[H{Ka,'^\Z", 

(a) 



-log! ,^, ^, , + V p(m,x"|z")i+" 
log(|/C||<P|)-i^p(z")x... 

X log h + |/Cn<Z'r 5] p(m,x"|z")i+" J 



>iog(i/cii<^i)-(^^ji/cri<i'rx... 

x^p(^")^p(m,x"|z")i+", 



(125) 



where in (a) we pulled out the |/C|^"|^|^" term from the 
logarithm above and in (6) we applied the relation log(l+i) < 
iloge (recall that log = logg). The proof of the lemma is 
completed by uniting dl05b and dl25b . 
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